Replacing array listing formula (INDEX(MIN...) with more efficient solution

lindstroem

New Member
Joined
Mar 2, 2015
Messages
39
Office Version
  1. 2016
Been a while but now i've once more hit a calculation efficiency wall. It's always sad that the most elegant and compact formula decisions that often leads to higher calculation times.

File-description and end goal:
Inputdata consisting of up to 50.000 data rows contains a maximum of 5000 data points that is to be found and listed in a separate sheet. The datarow of interest looks like this and can be identified by that the lines of interest always start with "20". This dataline along with the cell beneath this is listed in the separate sheet. I have managed this with 1xarrayformula per row which results and a formula that then identifies the row number of the array formula result and adds +1 on the row to get the row beneath

Please help with improving calculation speed.
Does anyone have any tips on how to remove the need for the array-formula or making it faster? Creating a better identifier on the first page with relevant row data that can be used as direct reference in the second sheet?
Very much open for suggestions and input!

Simplified input data example (Sheet1)
200000844506 00000000013520003030201000476529
50 ANDERSSON
345345220202
200000844507 00000000013520003030201000476529
50 JOHANSSEN
345345220345345934202
200000844508 00000000013520003030201000476529
50 MARK
23492342342938

End goal - Finished list in the a new sheet (Sheet 2)
200000844506 0000000001352000303020100047652950 ANDERSSON
200000844507 0000000001352000303020100047652950 JOHANSSEN
200000844508 0000000001352000303020100047652950 MARK

Array formula per line currently used
{=INDEX(SHEET1!$A$2:$A$50022,MIN(IF(LEFT(SHEET1!$A$2:$A$50022,2)="20",ROW(SHEET1!$A$2:$A$50022)-ROW(SHEET1!$A$2)+1),ROWS(SHEET1!$A$2:$A2)))}
 

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest
See if this is any better, note that D$2:D2 should refer to the cell that the first formula goes in before filling down. D1 should refer to the cell above the formula.

=IF(D1="","",IFERROR(INDEX(SHEET1!$A:$A,AGGREGATE(15,6,ROW(SHEET1!$A$2:$A$50022)/(LEFT(SHEET1!$A$2:$A$50022,2)="20"),ROWS(D$2:D2))),""))

If you don't actually have 50022 rows of data, a dynamic named range would improve performance.
 
Upvote 0
update you profile (Account details) about Excel version, less troubles and wasted posts

then ad rem
assuming that the structure is the same all the time and this is a representative example
you can try Power Query
rawraw50
200000844506 00000000013520003030201000476529200000844506 0000000001352000303020100047652950 ANDERSSON
50 ANDERSSON200000844507 0000000001352000303020100047652950 JOHANSSEN
345345220202200000844508 0000000001352000303020100047652950 MARK
200000844507 00000000013520003030201000476529
50 JOHANSSEN
345345220345345934202
200000844508 00000000013520003030201000476529
50 MARK
23492342342938

Code:
// Table1
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    Coditions = Table.AddColumn(Table.TransformColumnTypes(Source,{{"raw", type text}}), "50", each if Text.Contains([raw], "50 ") then [raw] else null),
    Result = Table.AlternateRows(Table.FillUp(Coditions,{"50"}),1,2,1)
in
    Result
then load result to the sheet what you want
 
Last edited:
Upvote 0
update you profile (Account details) about Excel version, less troubles and wasted posts

then ad rem
assuming that the structure is the same all the time and this is a representative example
you can try Power Query
rawraw50
200000844506 00000000013520003030201000476529200000844506 0000000001352000303020100047652950 ANDERSSON
50 ANDERSSON200000844507 0000000001352000303020100047652950 JOHANSSEN
345345220202200000844508 0000000001352000303020100047652950 MARK
200000844507 00000000013520003030201000476529
50 JOHANSSEN
345345220345345934202
200000844508 00000000013520003030201000476529
50 MARK
23492342342938

Code:
// Table1
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    Coditions = Table.AddColumn(Table.TransformColumnTypes(Source,{{"raw", type text}}), "50", each if Text.Contains([raw], "50 ") then [raw] else null),
    Result = Table.AlternateRows(Table.FillUp(Coditions,{"50"}),1,2,1)
in
    Result
then load result to the sheet what you want

Hello and thank you, I've updated the Excel-version.

The structure varies somewhat so no clear pattern in the data (i.e. 50 can be 1st/2nd/3rd.. row behind the key data which begins with "20", but the 20-row is the most important so that can be the main one and if the subsequent posts are not "50 Name" that can be forgiven. I've never used powerquery so I'm a novice in term of how to implement that.
 
Upvote 0
See if this is any better, note that D$2:D2 should refer to the cell that the first formula goes in before filling down. D1 should refer to the cell above the formula.

=IF(D1="","",IFERROR(INDEX(SHEET1!$A:$A,AGGREGATE(15,6,ROW(SHEET1!$A$2:$A$50022)/(LEFT(SHEET1!$A$2:$A$50022,2)="20"),ROWS(D$2:D2))),""))

If you don't actually have 50022 rows of data, a dynamic named range would improve performance.

Hi and thank you for your response.
Genius to just add the simple IF-formula so spare loads of resources as you correctly assume that I in most cases dont have that many rows. By just adding that IF-check the loading time is reduced to just a few seconds, so that might be enough for the document. I've not tried the complete formula-suggestion due to this (and the matter being urgent), but I will make sure to update the formula to try the aggregate-version during the week to see if it brings further improvements.

Thanks again and I hereby consider the question solved and thank both you and sandy666 for your time and effort.
 
Upvote 0
could you post more representative example?

Hi!
The suggestions from jasonb75 solved the issue quite well, but if you wish to provide an update solution suggestion from an academic and learning stand point I welcome it seeing as I am unfamiliar with Powerquery. However to pressure :)

See the below example for a more representative selection how of the data may appear. The most common is that the 50-series with name is directly followed after the #20..... series that is the key series to fetch, but not always. But as said, in those cases it can be fine that it does not show the correct 50-data just as long as the 20-data is fetched correctly.
203071045300 00000243543523452435
40REFERENS: 231423412341234 Ev retur återbet till Bg 5871-4353
50 Name1
51Adress
52856 53 SUNDSVALL
425435245245 00000002435243524352435
61SV HANDELSBANKEN AB, INTERNETBET FÖRETAG,CCMF
6320540 MALMÖ
200000134502 00000234523452345
40REFERENS: 0000134502 Ev retur återbet till Bg 234234234
50 NAME AGAIN
51Adress 234234
52852 38 SUNDSVALL
123412341324 132413241234
61SV HANDELSBANKEN AB, INTERNETBET FÖRETAG,CCMF
6320540 MALMÖ
2039709713 000000000234523452345345
40Fakturanr 123413241234
50NAME UL. MORE NAME
656-3332 KRAKOW PL PL
61BANK POLSKA KASA OPIEKI SA - BANK P
70000000000000000SEK asdfsdfadsfdsf
200000226803 000002345324523453245
50 RELEVANT NAME

 
Upvote 0
so this result is ok?
raw50
203071045300 0000024354352345243550 Name1
200000134502 0000023452345234550 NAME AGAIN
2039709713 00000000023452345234534550NAME UL. MORE NAME
200000226803 00000234532452345324550 RELEVANT NAME
 
Upvote 0
I'm not seeing how your original formula can work, using MIN, the array is always going to return the first entry that starts with "20".
Using the AGGREAGTE method, the formula is progressive and will return subsequent matches as it is filled down.
Below, column C uses your original formula, column D uses my suggestion from earlier, column E uses that suggestion with a dynamic named range which will make it even more efficient.
Without this, the array processes all rows in the range Sheet1!A2:A50022, even if they are empty. With the dynamic range, if the last entry is in A2022 then only A2:A2022 will be processed in the array (a 96% reduction in effort).

Cell Formulas
RangeFormula
C2:C10C2=INDEX(Sheet1!$A$2:$A$50022,MIN(IF(LEFT(Sheet1!$A$2:$A$50022,2)="20",ROW(Sheet1!$A$2:$A$50022)-ROW(Sheet1!$A$2)+1),ROWS(Sheet1!$A$2:$A2)))
D2:D10D2=IF(D1="","",IFERROR(INDEX(Sheet1!$A:$A,AGGREGATE(15,6,ROW(Sheet1!$A$2:$A$50022)/(LEFT(Sheet1!$A$2:$A$50022,2)="20"),ROWS(D$2:D2))),""))
E2:E10E2=IF(E1="","",IFERROR(INDEX(Sheet1!$A:$A,AGGREGATE(15,6,ROW(Index_Range)/(LEFT(Index_Range,2)="20"),ROWS(E$2:E2))),""))
Press CTRL+SHIFT+ENTER to enter array formulas surrounded with curly braces.
Named Ranges
NameRefers ToCells
Index_Range=Sheet1!$A$2:INDEX(Sheet1!$A:$A,MATCH("ZZZ",Sheet1!$A:$A))C2:E10
 
Upvote 0
so this result is ok?
raw50
203071045300 0000024354352345243550 Name1
200000134502 0000023452345234550 NAME AGAIN
2039709713 00000000023452345234534550NAME UL. MORE NAME
200000226803 00000234532452345324550 RELEVANT NAME
That result is perfect :)
 
Upvote 0

Forum statistics

Threads
1,214,999
Messages
6,122,645
Members
449,093
Latest member
Ahmad123098

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top