Updating Records - bloated database

HedgePig

Board Regular
Joined
Jun 27, 2002
Messages
146
I'm having problem with my database size becoming bloated. Here's the story - all of this is being done within VBA.

I have an empty table which I append records to by means of an INSERT query. I really only append skeleton information, enough to have a unique identifier - Customer ID & Transaction ID. All the other fields are null.

Once this is done, I loop through all the records in this target table. I pick up information from another table and update individual fields in the target table based on what I find in the other table. Altogether there are around 70,000 records and perhaps 20 fields in each record that are updated.
This seems to cause the database to bloat in size - and in fact exceeed the 2GB limit. Compacting the database shrinks it back down to a much more reasonable size.

Why would editing individual fields cause the database size to bloat so much? Is it because Access is storing the previous values so the updates could be undone? If so, is there anyway to prevent this?

Of course I'm by no means certain that it is the editing of the fields that is causing this bloat but I have done some testing and this seems to be the cause. I tried populating the fields in the first step (with the INSERT query), and then only editing the ones that needed changes. This produces the same end result but requires far less editing - and the bloat is much smaller, although it's still quite big.

I'd love to hear any suggestions as to what is causing this and how I could avoid it.

In case it's relevant, I'm using Access 2007 under Windows XP.
 
Last edited:

Excel Facts

Show numbers in thousands?
Use a custom number format of #,##0,K. Each comma after the final 0 will divide the displayed number by another thousand
Good advice. Also, insert and delete operations will cause the database to hold a "shadow" copy while the edits happen... maybe Updates do that too.

If your database bloats regularly you can set an option to Compact on Close. I tend to do that for analytical databases like the one you are describing.

I guess there's no need to warn you about the need for regular backups...

Denis
 
Upvote 0
Hello Derek and SydneyGeek

Thanks for your replies - much appreciated. In this case I don't think that failing to close the recordset is the cause of the problem. I'm also beginning to suspect that the updating is _not_ in fact the cause of the bloat, but rather my use of collections. I'm using them a LOT within the code and possibly I am not freeing up the memory they use - although I can't see what I'm doing wrong.

I'm posting some sample code. It is a simplifcation of the "real code" - but running it produces eactly the same problem. If I start with a 10Mb database, running the code once causes it to expand to 50Mb - and compacting it shrinks it back down to 10Mb.

There are 2 tables in the database:
1) A source table with Customer ID, Transaction ID and 6 further fields (which contain random data for testing purposes.) Each customer ID can have several transaction ID's.
2) A Target table which contains Customer ID and the same 6 further fields as occur in the source table.

The aim of the code is to take source table, create a list of unique customer ID's in the target table.
Once this is done, the code loops through the source data and updates the the remaining fields in the target table with the value that occurs most often for each each field for a particular customer ID. (In the "real" app, the rules for updating each field may be more complicated.)
To track how many times a value occurs in a particular field, I'm using a collection. Each distinct value is added to the collection and a counter is incremented each time it occurs. Once I have finished with all the transaction records for the current customer, I clear the collection (I think!) and repeat the process for the next customer

If you do have any suggestions as to what could be causing the problem I'd be very grateful!

Code:
Option Compare Database

'Creates a Unique list of Customer ID's (selected from the source table) and appends them to the target table
'Then loops through the source table and updates the remaining fields in the target table
'For a given customer ID, each field is updated with whatever value occurs the most often in for that customer ID in the source table

Sub MainRoutine()

Dim TgtRS As Recordset  'Contains source data - each customer ID can have many transactions
Dim SrcRS As Recordset  'Contains the "summarised" output - each customer ID has only one record

   CurrentDb.Execute "DELETE * From TargetTable;"
   'Insert just the customer ID's into the target database.
   'Later on populate the other fields
   CurrentDb.Execute "INSERT INTO TargetTable (CustomerID) " & _
                     "SELECT SourceTable.CustomerID " & _
                     "FROM SourceTable GROUP BY SourceTable.CustomerID;"
  
  Set TgtRS = CurrentDb.OpenRecordset("SELECT CustomerID, Field3, Field4, Field5, Field6, Field7, Field8 FROM TargetTable ORDER BY CustomerID")
  
  'Now populate the other fields
  Set SrcRS = CurrentDb.OpenRecordset("SELECT CustomerID, TransactionID, Field3, Field4, Field5, Field6, Field7, Field8 FROM SourceTable ORDER BY CustomerID, TransactionID")
  Call UpdateRecords(TgtRS, SrcRS)

  
  TgtRS.Close
  SrcRS.Close

  Set TgtRS = Nothing
  Set SrcRS = Nothing

End Sub

Sub UpdateRecords(TgtRS As Recordset, SrcRS As Recordset)
Dim F3cln As New Collection   'Collection to hold values that occur in field 3 and the number of times each value occurs
Dim F4cln As New Collection
Dim F5cln As New Collection
Dim F6cln As New Collection
Dim F7cln As New Collection
Dim F8cln As New Collection

   Do While Not SrcRS.EOF
      Do While TgtRS!CustomerID = SrcRS!CustomerID
         
         Call UpdateCln(F3cln, SrcRS!Field3)
         Call UpdateCln(F4cln, SrcRS!Field4)
         Call UpdateCln(F5cln, SrcRS!Field5)
         Call UpdateCln(F6cln, SrcRS!Field6)
         Call UpdateCln(F7cln, SrcRS!Field7)
         Call UpdateCln(F8cln, SrcRS!Field8)
         
         SrcRS.MoveNext
         If SrcRS.EOF Then Exit Do
      Loop
      Call UpdateField(TgtRS, F3cln, TgtRS!Field3)
      Call UpdateField(TgtRS, F4cln, TgtRS!Field4)
      Call UpdateField(TgtRS, F5cln, TgtRS!Field5)
      Call UpdateField(TgtRS, F6cln, TgtRS!Field6)
      Call UpdateField(TgtRS, F7cln, TgtRS!Field7)
      Call UpdateField(TgtRS, F8cln, TgtRS!Field8)
            
      Call EmptyCln(F3cln)
      Call EmptyCln(F4cln)
      Call EmptyCln(F5cln)
      Call EmptyCln(F6cln)
      Call EmptyCln(F7cln)
      Call EmptyCln(F8cln)
      
      TgtRS.MoveNext
      If TgtRS.EOF Then Exit Do
   Loop
   
End Sub


'Increases count for the occurence of the specific itemval.
'If it hasn't occured before, error is generated but the error trapping adds it to the collection
Public Sub UpdateCln(cln As Collection, itemval As Variant)
'Static citem As clnitemClass
Static cstritem As String

cstritem = CStr(itemval)
   On Error GoTo missingitemerr
   cln(cstritem).icount = cln(cstritem).icount + 1

Exit Sub

missingitemerr:
   'citem.itemval = itemval
   cln.Add New clnItemClass, CStr(itemval)
   cln(cstritem).itemval = itemval
   On Error GoTo 0
Resume
End Sub

'Picks the value which occurs the most often and updates the field with that value
Public Sub UpdateField(RS As Recordset, cln As Collection, fld As Field, Optional minchanges As Integer = 1)
Dim modval As Variant
   
   If cln.Count >= minchanges Then
      modval = FindMaxOccVal(cln)
      RS.Edit
      fld = modval
      RS.Update
   End If

End Sub

'Returns the field value that occurs the most often in a collection (for ties on occurence, will return most recent field)
Function FindMaxOccVal(cln As Collection) As Variant
Static i As Integer
Static MaxOcc As Integer    'highest number of occurences
Static MaxOccPos As Integer 'position in collection where the maximum occurence occurs
   MaxOcc = -1
   MaxOccPos = -1
   For i = 1 To cln.Count
      If cln(i).icount >= MaxOcc Then
         MaxOcc = cln(i).icount
         MaxOccPos = i
      End If
   Next
   FindMaxOccVal = cln(MaxOccPos).itemval

End Function
Sub EmptyCln(cln As Collection)
   Do While cln.Count > 0
      'Set cln(1) = Nothing
      cln.Remove (1)
   Loop
   
End Sub
This is the code for the clnItemClass
Each instance of this class hold a value that occurs in a field and how many times it occurs
Code:
Option Compare Database

Public icount As Long
Public itemval As Variant
 
Last edited:
Upvote 0
Hi HedgePig,

You may be emptying the collections but I don't think you are releasing them. see if this helps...
Code:
Sub EmptyCln(cln As Collection)
    Do While cln.Count > 0
       cln.Remove (1)
    Loop
    Set cln = Nothing
  End Sub
Denis
 
Upvote 0
Hello Dennis - thanks for your suggestion

Actually I did have "Set cln = Nothing" in exactly the place you suggested. However I removed it because I thought that although I had emptied the collection, it would still be used again for the next set of transactions (for the next CustomerID.) I thought it should only be set to nothing once I'm completely finished with the collection. I was actually suprised the code worked with the "Set cln = Nothing" in.

Having said that, the code did work and it seems to make no difference whether or not I have that line in! I don't understand this.

Still baffled
HedgePig
 
Upvote 0
Hmmm.....I take back what I said about the collections being the cause of the problem. At least for now!

I find that if I leave everything unchanged except that I comment out the 3 lines that actually update the fields in the UpdateFields routine
RS.Edit
fld = modval
RS.Update
then, I don't get any code bloat at all!

This seems to shift the blame back to the editing of the fields. I wonder what is happening here?
 
Upvote 0
I've tried another simple experiment. This time I created a database with a single table that has the 5 text fields, Field1, Field2, Field3, Field4 and Field5.

In my test run I had just under 10,000 records in this table. The database was around 1.3 Mb in size.

Running the following code _sometimes_ causes the database to bloat enormously - to around 38 Mb. Compacting the database restored it to its former slim self.

I am (still) completely baffled as to what is happening and why!

Code:
Sub Main()

Dim cln As New Collection
Dim RS As Recordset
Dim i As Long

   Set RS = CurrentDb.OpenRecordset("SELECT Field1, Field2, Field3, Field4, Field5 from Table1;")
   RS.MoveFirst
   For i = 1 To 10
      Do While Not RS.EOF
         RS.Edit
            RS!field1 = i & " ABCDEFGHIJKLMNOPQRSTUVWXYZ " & Chr(65 + Rnd() * 26)
            RS!Field2 = i & " HORRIBLYBLOATEDDATABASE " & Chr(65 + Rnd() * 26)
            RS!Field3 = i & " Field 4 " & Chr(65 + Rnd() * 26)
            RS!Field4 = i & " Field 5 " & Chr(65 + Rnd() * 26)
            RS!Field5 = i & " Field 6 " & Chr(65 + Rnd() * 26)
         RS.Update
         RS.MoveNext
      Loop
      RS.MoveFirst
   Next
   RS.Close
End Sub
 
Upvote 0
I think it's the 'shadow copy' that I mentioned before. Edit a record, and Access seems to edit a copy of that record until you commit (save) it. Then the new version becomes the record. Not a big deal in a transactional database; a much bigger deal if you change thousands of records at once. As far as Access is concerned you have made 100K edits and therefore have 90K records in "holes" that need removing with a compact & repair. Of course the math doesn't add up; there must be some additional overhead identifying and tracking those edited records, and some of that will likely be in the system tables.

So, compact on close. It's still the best option I've found.

Denis
 
Upvote 0
Hello Denis

You comment about a shadow copy being held sounds possible.
I did try fiddling around with BEGINTRANS and COMMITTRANS. Although the code executed, it didn't make any difference. Having said that I didn't really know what I was doing, so I could well have implemented incorrectly.

I also posted this on another forum and one suggestion was to try updating directly using a SQL UPDATE statement. This seems to solve the problem! The speed hit also didn't seem to be too bad. Although I have still have to try this in my "real" database, I'm 99% sure it will solve the problem.

Thanks again for all your help and suggestions.

Regards
HedgePig
 
Upvote 0

Forum statistics

Threads
1,224,590
Messages
6,179,763
Members
452,940
Latest member
rootytrip

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top