Creating new data column efficiently

BBalazs

New Member
Joined
Apr 6, 2013
Messages
11
I am seeking a way to optimize some of my code, and have run into a wall I can't seem to be able to tackle. Actually I am seeking an efficient way of populating a dataset with a new column of calculated data.
I have already exhausted my optimization possibilities with
Code:
Application.ScreenUpdating = False
Application.EnableEvents = False
Application.Calculation = xlCalculationManual

So Here it goes:
Code:
For row_number = 2 To num_rows
destws.Cells(row_number, column_number) = Format(destws.Range("A" & row_number), "dddd")
Next row_number
This is the original code.
column_number is predetermined: this is the first empty column where the fill should take place

The function itself is not the primary question, I have many more columns added with this method. Like this:
Code:
destws.Cells(row_number, column_number) = 1 / Application.WorksheetFunction.CountIfs(destws.Range("C$2:C$" & num_rows), destws.Range("C" & row_number), destws.Range("D$2:D$" & num_rows), destws.Range("D" & row_number))
Unfortunately, for more complicated formulae like the one above this code gets pretty slow once the dataset grows large (currently at 5000 rows and growing). I would also prefer using Application.WorksheetFunction because I don't need the formulae in the sheet.

I have been searching and reading a lot on this subject. Some claim that the For To Next cycle is not an effective way of creating new data columns. So I have been going for mass insertion, which is claimed to be pretty fast:

Code:
ThisWorkbook.Sheets("ActiveDataset").Range(Cells(2, column_number), Cells(num_rows, column_number)).Value = Format(destws.Range("A" & row_number), "dddd")
Please note that destws is actually the same as ThisWorkbook.Sheets("ActiveDataset") - for some reason I just needed a fully qualified reference for .Value to work.

Now as you see, the code above does not contain the For cycle, which should be good. Bad news is that it falls over row_number, which is obviously a constant - that is, it does not adjust to the proper row number.

I am in much need of a bit of insight on how to be able to create new data columns in a fast, efficient way. Thanks in advance!
 

Excel Facts

VLOOKUP to Left?
Use =VLOOKUP(A2,CHOOSE({1,2},$Z$1:$Z$99,$Y$1:$Y$99),2,False) to lookup Y values to left of Z values.
Hi,

For your first example where you are just formatting the dates differently in a separate column try:
Code:
    With destws.Cells(2, column_number).Resize(num_rows, 1)
        .Value2 = .Offset(0, 1 - column_number).Value2
        .NumberFormat = "dddd"
    End With

For cases where you need to apply a function, try using the Formula property and then converting the resulting output to values, something like this (which should return the same result as the preceding code):
Code:
    With destws.Cells(2, column_number).Resize(num_rows, 1)
        .Formula = "=Text(A2, ""dddd"")"
        .Value2 = .Value2
    End With

Another possible fast alternative would be to read the whole block of data into an array, process it and then return all the processed data back to the worksheet together. If you are not already familiar with arrays in VBA, you might find some of the following links useful:
Understanding Arrays
Visual Basic Arrays Tutorial
VBA Arrays
 
Upvote 0
Thanks for your help. It took a bit of time to figure out the correct syntax for using variables in .Formula, but things work just fine now.
Unfortunately this method doesn't seem to be much faster either. I will work out some exact calculations this afternoon to see if it helps. I think I am also willing to take the time and dig into arrays, but my question is - do you think it's actually a faster method?

I'm starting to believe the problem is with COUNTIF and COUNTIFS being slow over large volumes of data - having to look through now 7500 rows of data is time, and if that procedure runs 7500 times it is most likely exhausting.
 
Upvote 0
Hi,

I think it depends on what you are trying to achieve. The procedure should only run once rather than 7500 times, although I'm not sure I understand what you mean exactly.

Perhaps post back with more context, showing how you are using the code, and also maybe samples of your data and expected outputs and hopefully someone can help further.
 
Upvote 0
Thanks for your help. Right now I can't give sample data, but I will try to describe the problem as much as possible.

I have a Dataset pulled from a csv file, which is updated on a daily basis. The dataset is growing steadily with about 300-400 new rows/day. This dataset is first imported, then stored on a sheet called "ActiveDataset" by a VBA macro.

There are a number of pivot tables based on this Dataset, and unfortunately they need some extra columns not available directly in the dataset. This means that I have to run a macro which adds new columns to the Dataset on an automated basis, right after data is imported. The code you have seen before does just that.

Code available here:
Code:
Sub AddMeasures()
Dim column_number As Integer
Dim row_number As Integer
Dim num_rows As Integer
Dim destws As Worksheet

Set destws = ThisWorkbook.Sheets("ActiveDataset")
num_rows = destws.UsedRange.Rows.Count

'DAY OF WEEK
Application.StatusBar = "Adding Day of Week"
column_number = destws.Cells.Find(What:="*", After:=Range("IV1000000"), SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column + 1
destws.Cells(1, column_number) = "DayOfWeek"
With destws.Cells(2, column_number).Resize(num_rows, 1)
    .Value2 = .Offset(0, 1 - column_number).Value2
    .NumberFormat = "dddd"
End With

'WEEKDAY COUNT
Application.StatusBar = "Adding Day of Week Count"
column_number = destws.Cells.Find(What:="*", After:=Range("IV1000000"), SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column + 1
destws.Cells(1, column_number) = "DoWCount"
With destws.Cells(2, column_number).Resize(num_rows, 1)
    .Formula = "=1/COUNTIF(" & "A$2:A$" & num_rows & ", A2)"
    .Value2 = .Value2
End With

'TAG COUNT
Application.StatusBar = "Adding Tag Count"
column_number = destws.Cells.Find(What:="*", After:=Range("IV1000000"), SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column + 1
destws.Cells(1, column_number) = "TagCount"
With destws.Cells(2, column_number).Resize(num_rows, 1)
    .Formula = "=1/COUNTIFS(" & "C$2:C$" & num_rows & ", C2, " & "D$2:D$" & num_rows & ", D2)"
    .Value2 = .Value2
End With

End Sub

For now I have to add three columns. One creates a Day of Week variable, and the two others are necessary for the Pivot Tables to weigh data properly.
It is the last two columns - particularly the last one - that take a somewhat long time to add. For now it is tolerable, but as the dataset grows continuously, it won't be long before calculation times spiral out of control.

Thanks in advance!
 
Upvote 0
An approach you might consider is to not add extra columns to the dataset, just create the extra field via SQL. So instead of current approach
Code:
SELECT fields
FROM table
Use
Code:
SELECT fields, formula_or_whatever AS [new field]
FROM table
This can be created once and then it is a simple pivot table refresh. Set it up manually from a new workbook, ALT-D-P to start the PT wizard, choose external data source at the first step, follow the wizard. Google for examples. HTH. regards
 
Upvote 0
Unfortunately for now, I don't think I can query the source CSV file. Nevertheless, this option will come in handy when migrating our data to a real database, which is bound to happen in something like a month.

I will be looking for examples on this, but seeing that you are truly an expert in this field, can you provide me with some help on what the "formula_or_whatever" parameter might look like? Just to get me started on the subject. For example, what would be the right syntax for the following?
=1/COUNTIF(AllRecordsFor[Date],CurrentRowRecordFor[Date])

This already means a lot of help to me, because you are providing me with insights on what direction I should head with my research. Thanks!
 
Upvote 0

Forum statistics

Threads
1,215,903
Messages
6,127,652
Members
449,395
Latest member
Perdi

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top