Normalising data from a Reporting database

obiron

Active Member
Joined
Jan 22, 2010
Messages
469
Hi guys,

I am looking for some advice on whether I am approaching this from the right direction. This is a bit complicated so bear with me.

First declaration: I am doing something that really sticks in my craw: taking data from an SQL Data warehouse via an Access application to Excel then loading it into Access. By the time we are finished this data will have been through 5 transformations JUST to get it back to its original format :ROFLMAO: Production->SQL warehouse->access2003->excel->access2010 data import->access2010 transformation macro

I have a reporting application which provides me with a results set which contains:
8 fields which describe an asset (including AssetID, CostCentre)
3 fields which describe a payment scheme (type, method, etc..)
80 fields which descibe payment elements for that asset.

on each record the 8 asset fields and 3 payment scheme fields are completed, the asset data may be repeated because it may have more than one payment scheme
The payment elements will either be empty (NULL), have zero (cancelled charge) or a currency value.

I need to get this data into 3rd Normal form.


I have created separate tables for the assets and cost centres by doing maketable queries with the DISTINCT keyword and these tables are indexed
Asset = AssetID
CostCentre = CostCentreID

The next challenge is to create a list of chargecodes. I have done this by taking the column headings and transposing them in Excel, adding extra data columns and them importing them into an Access table (ChargeCode: PK = ID)

I now need to create a CostCentre-ChargeCode cross reference table (i'll call it C4 from now on) for every charge code that has a value against ANY asset in that charge code. I then need to create an Asset-C4 cross reference to indicate that the particular asset is attached to the charge code

For this last one I have written a VBA macro that follows this logic

open a recordset for the import file (rsIMP)
for each record in the recordset, loop through the charge fields (13-93)
If the field has a value
open a recordset for the ChargeCode (rsCHG)where the description matches the rsIMP.fields(fieldloop).name. This should ALWAYS give me one record
open a recordset for the C4 (rsC4) where CostCentre = rsIMP.fields("CostCentreID") and ChargeCodeID = rsCHG.fields("ID"). This will give me 0 or 1 records.
if rsC4.EOF = true then prepare an SQL command to insert the record and then run DoCmd.runSQL(SQLcommand) and re-run the query to get the new ID
prepare an SQL command to insert the Asset-C4 table with the AssetID and C4.ID
get next rsIMP field
get next rsImp record

This is working but it is slow. I have 57K import records with 80 fields (=4.5m loops) and probably 500K insert statments. E.g It has been running for an hour and is only 8000 records in - so it will take about 6 hours to complete.

Is there a better approach to re-normalise a flattened data structure

I am using Access2010 over Ctirix 12 using WYSE terminals so this is all running on the SuperPC cluster over a T100 ethernet network

Actual code is below
Code:
Sub aaron()
    Dim db As Database
    Dim rsImport As Recordset
    Dim rsWorking As Recordset
 
    Dim SQL As String
    Dim y As Long
    Dim ID As Long
 
 
    Set db = CurrentDb()
 
    SQL = "SELECT * FROM ImportActiveH"
 
    Set rsImport = db.OpenRecordset(SQL, dbOpenDynaset)
 
    Do While Not rsImport.EOF
 
        For y = 0 To rsImport.Fields.Count - 1
            If Left(rsImport(y).Name, 1) = "U" And Not IsNull(rsImport(y).Value) Then
                '* we are in the cost codes and have a value and need to create the additional records
                '* Get the chargecode ID for the chargeable ChargeCode.  At this stage there is only one bottom code per
                '* ActiveH code.
 
                SQL = "SELECT ID FROM ChargeCode CC WHERE CC.Description = """ & rsImport(y).Name & """"
 
                'MsgBox SQL
 
                Set rsWorking = db.OpenRecordset(SQL, dbOpenDynaset)
 
                If Not rsWorking.EOF Then '* the charge code exists
                    ID = rsWorking.Fields("ID").Value
                    '* get the bottom level ID
                    SQL = "SELECT Top 1 ID from q_BottomLevelChargeCodes as CC where CC.ParentChargeCodeID = " & ID & " order by ID"
 
                    Set rsWorking = Nothing
                    Set rsWorking = db.OpenRecordset(SQL, dbOpenDynaset)
                    If Not rsWorking.EOF Then '* the bottom level record exists
                        ID = rsWorking.Fields("ID").Value
                        '* see if the CostCentreChargeCode exists
                        If InStr(rsImport.Fields("Postcode").Value, " ") <> 0 Then
 
                            SQL = "SELECT * FROM CostCentreChargeCode as C4 WHERE C4.CostCentreID = """ & rsImport.Fields("CostCentre").Value & """" _
                               & " AND C4.CostCentreInboundPostcode = """ & Left(rsImport.Fields("Postcode").Value, InStr(rsImport.Fields("Postcode").Value, " ") - 1) & """" _
                               & " AND C4.ChargeCodeID = " & ID
                        Else '* there is a problem with some of the postcodes having a hard space instead of a soft space....
                            SQL = "SELECT * FROM CostCentreChargeCode as C4 WHERE C4.CostCentreID = """ & rsImport.Fields("CostCentre").Value & """" _
                               & " AND C4.CostCentreInboundPostcode = """ & Left(rsImport.Fields("Postcode").Value, 4) & """" _
                               & " AND C4.ChargeCodeID = " & ID
                        End If
 
                        Set rsWorking = Nothing
                        Set rsWorking = db.OpenRecordset(SQL, dbOpenDynaset)
 
                        If rsWorking.EOF Then
                            '* we need to create the costcentreChargeCode record
                            SQL = "INSERT INTO CostCentreChargeCode (CostCentreID,CostCentreInboundPostCode,ChargeCodeID) VALUES (""" & rsImport.Fields("CostCentre").Value & """" _
                                & ",""" & Left(rsImport.Fields("Postcode").Value, InStr(rsImport.Fields("Postcode").Value, " ") - 1) & """" _
                                & "," & ID & ")"
                            'MsgBox SQL
 
                            DoCmd.SetWarnings False
                            DoCmd.RunSQL SQL
 
                        Else
                            '* Capture the ID because we need it to create the AssetC4 record
                            ID = rsWorking.Fields("ID")
                        End If
 
 
 
                    Else
                        '* we have a problem
                        MsgBox "Missing Bottom level Charge Code for " & rsImport(y).Name & " for asset ID " & rsImport.Fields("U100AssetRef").Value
                    End If
 
                    '* create the AssetC4
                    If rsWorking.RecordCount = 0 Then '* we just inserted a new C4 record
                         If InStr(rsImport.Fields("Postcode").Value, " ") <> 0 Then
 
                             SQL = "SELECT * FROM CostCentreChargeCode as C4 WHERE C4.CostCentreID = """ & rsImport.Fields("CostCentre").Value & """" _
                                & " AND C4.CostCentreInboundPostcode = """ & Left(rsImport.Fields("Postcode").Value, InStr(rsImport.Fields("Postcode").Value, " ") - 1) & """" _
                                & " AND C4.ChargeCodeID = " & ID
                         Else '* there is a problem with some of the postcodes having a hard space instead of a soft space....
                             SQL = "SELECT * FROM CostCentreChargeCode as C4 WHERE C4.CostCentreID = """ & rsImport.Fields("CostCentre").Value & """" _
                                & " AND C4.CostCentreInboundPostcode = """ & Left(rsImport.Fields("Postcode").Value, 4) & """" _
                                & " AND C4.ChargeCodeID = " & ID
                         End If
 
                         Set rsWorking = Nothing
                         Set rsWorking = db.OpenRecordset(SQL, dbOpenDynaset)
                         ID = rsWorking.Fields("ID").Value
                    End If
                    DoCmd.SetWarnings False
                    DoCmd.RunSQL ("INSERT into AssetC4 (AssetID,C4ID,Ratio) VALUES(""" & rsImport.Fields("U001AssetRef").Value & """," & ID & ",1)")
 
 
                End If
 
 
 
            End If
 
        Next y
 
    lvarStatus = SysCmd(acSysCmdSetStatus, rsImport.AbsolutePosition)
    rsImport.MoveNext
 
    Loop
 
End Sub

Table Structures (irrelevant fields omitted):
Asset: AssetID (PK), Address, InboundPostcode (FK to CostCentre),CostCentre (FK to CostCentre)
CostCentre: CostCentre(PK), InboundPostCode(PK)
ChargeCode:ID(PK autonumber),Description,ParentChargeCodeID(FK to ChargeCode.ID) codes are hierachical. Data Entry codes feed into summary reporting codes. Currently there is one DE code for each REP code but in maintenance mode, the users will create additional DE codes as the asset ratios may be different.
CostCentreChargeCode: ID (PK autonumber),CostCentreID (FK to CostCentre),CostCentreInboundPostCode (FK to CostCentre),ChargeCodeID (FK to ChargeCode)
AssetC4: ID (PK autonumber), AssetID (FK to Asset), C4ID (FK to CostCentreChargeCode)
CostC4:ID (PK autonumber), C4ID (FK to CostCentreChargeCode) this is not used in the code above. This will be used to assign costs to the Cost Centre and then apportion them to the assets via the AssetC4 table
 

Excel Facts

Save Often
If you start asking yourself if now is a good time to save your Excel workbook, the answer is Yes
You're probably going to throw stuff at me when I say this, but - why are you attempting to normalize data warehouse data? Is there a specific reason? Personally, I just made use of it the way it was - realizing that it was that way for a reason - to speed up reporting. So why not use it that way?
 
Upvote 0

Forum statistics

Threads
1,215,180
Messages
6,123,502
Members
449,100
Latest member
sktz

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top