Checking column headers on data extract - is there a better way

JugglerJAF

Active Member
Joined
Feb 17, 2002
Messages
264
I receive a number of data files each month from a number of different sources, and while they are all supposed to be in a standard layout (column headers and column order), they often aren't.

I've written some code that checks to see if the values on the sheet match to either...
- the standard layout (in which case no changes are required)
- a known incorrect layout (in which case, I know what actions are required to get the file back to standard layout - moving/deleting/adding/renaming columns)
- a new unknown layout (in which case I display an error message)

The code I have (simplified of course, the actual files have almost 60 column headers) is shown below, but it's a bit of a pain to populate all of the new values whenever a new data file layout turns up (roughly one new layout every 3 or 4 months on average).

Is there an easier/more efficient/quicker to add new variations/better way of doing this?

Code:
Sub data_layout_checks()
'v0: standard layout
If UCase([A1]) = "NAME" And UCase([B1]) = "REGION" And UCase([C1]) = "AGE" And UCase([D1]) = "SALES" Then
    MsgBox "Standard layout.  No changes required."
    Exit Sub
End If
'v1: known incorrect version 1
If UCase([A1]) = "NAME" And UCase([B1]) = "REGION" And UCase([C1]) = "SALES" And UCase([D1]) = "AGE" Then
    Columns("D:D").Cut
    Columns("C:C").Insert Shift:=xlToRight
    MsgBox "Converted to standard layout (v1)."
    Exit Sub
End If
'v2: known incorrect version 2
If UCase([A1]) = "SALES" And UCase([B1]) = "DEPARTMENT" And UCase([C1]) = "REGION" And UCase([D1]) = "EMPLOYEE NAME" And UCase([E1]) = "AGE(Y)" Then
    Columns("B:B").Delete Shift:=xlToLeft
    Columns("A:A").Cut
    Columns("E:E").Insert Shift:=xlToRight
    Columns("A:A").Cut
    Columns("C:C").Insert Shift:=xlToRight
    [A1].Value = "Name"
    [C1].Value = "Age"
    MsgBox "Converted to standard layout (v2)."
    Exit Sub
End If
'error if unknown layout
    MsgBox "Unknown layout.  Unable to fix automatically."
End Sub
 

Fluff

MrExcel MVP, Moderator
Joined
Jun 12, 2014
Messages
30,317
Office Version
365
Platform
Windows
If the headers are correct, but the order is not try
Code:
Sub SetCols()
   Dim ColAry As Variant
   Dim i As Long, lc As Long
   Dim Fnd As Range
   
   ColAry = Array("NAME", "REGION", "AGE", "SALES")
   
   Application.ScreenUpdating = False
      
   For i = 0 To UBound(ColAry)
      Set Fnd = Range("1:1").find(ColAry(i), , , xlWhole, , , False, , False)
      If Not Fnd Is Nothing Then
         If Fnd.Column <> i + 1 Then
            Fnd.EntireColumn.Cut
            Columns(i + 1).Insert Shift:=xlToRight
            Application.CutCopyMode = False
         End If
      End If
   Next i
   lc = Cells(1, Columns.count).End(xlToLeft).Column
   If lc > UBound(ColAry) + 1 Then Range(Cells(1, UBound(ColAry) + 2), Cells(1, lc)).EntireColumn.Delete
End Sub
 

JugglerJAF

Active Member
Joined
Feb 17, 2002
Messages
264
It could be that the headers are correct, but not in the right order, but it could also be:
- wrong headers in the right order
- wrong headers in the wrong order
- missing columns that are not required (these are added in and populated with null)
- additional columns that are not required (these are deleted)
- any combination of all of the above

I can't think of a better way to do it than my current approach. It works, but it's a just a pain when I have to add in a new variation with around 60 "and" criteria.
 

Fluff

MrExcel MVP, Moderator
Joined
Jun 12, 2014
Messages
30,317
Office Version
365
Platform
Windows
Depending on what the headers are, it may be possible to rename them.
For instance if you only have one header that contains the word "Name" any instance of "Employee name", "First name", "Full name" etc could easily be changed to "Name"
 

JugglerJAF

Active Member
Joined
Feb 17, 2002
Messages
264
There *should* only be one instance of each column name, although the same column can have different names from extract to extract depending on who produces it. I do have a list of standard column names and the known variations, so I can easily replace a "known incorrect" to make it "correct", but I have had instances where the same column has been duplicated (which is fine as long as I know in advance and I can then just delete one of the surplus instances), but there will be instances of missing columns and surplus columns to deal with as well.

I need a way of saying "If the file is in "layout A", then that's fine and no changes are required, but if it's in "layout B" then I know that I need to do actions a, b, and c to make it standard, but if it's in "layout C" then I know that I need to do actions b, c, d, and e to make it standard (and so on for all known variations which are "fixable - i.e. have all mandatory columns in them somewhere so that I can work out what needs to go where, what needs to be renamed, what needs to be deleted, what needs to be inserted and so on so I can write that into the code), but if it's not standard or any of the known "fixable" variations, then I display a warning that the file cannot be fixed by the macro and that manual checking is required. Once manually checked, if it's got all the mandatory columns, I can then add that new variation to the code with instructions on how to fix it, so that if that particular version appears again, it then knows that while it's not standard, it does know what to do to make it standard.

The code I have does do this, I was just wondering if there was a better way of doing it given all the typing of the headers provided in each column for the new variations.

I had toyed with the idea of comparing arrays, but populating those was just as fiddly as listing all of the "A1 = name", "B1 = Age" and so on, and there was also the issue that the number of entries in each array wasn't necessarily always going to be the same which made array comparisons difficult. For example, if standard is "NAME", "REGION", "AGE", "SALES", but "AGE" is not a mandatory field, then a file that just contained NAME", "REGION", "SALES" (in any order, as long as it's an order I've seen before and know how to fix), is fine, but then I'd be comparing a 3 entry array (from the data file) to a 4 entry array (the standard column headers).

I think it probably best to just stick with what I have, but thanks for your suggestions.
 

Fluff

MrExcel MVP, Moderator
Joined
Jun 12, 2014
Messages
30,317
Office Version
365
Platform
Windows
Ok, thanks for the feedback
 

Forum statistics

Threads
1,077,991
Messages
5,337,591
Members
399,156
Latest member
RaudMees

Some videos you may like

This Week's Hot Topics

Top