Delete duplicated columns

chouba

New Member
Joined
Apr 20, 2021
Messages
9
Hello,
I m working on VBA code to delete columns if they are completely identical. I have found some solutions but most of them takes too much time.
Here are some information :
- My columns dont have header
- I need an efficient vba code that can handle 10 000+ columns in fastest way( maybe work with a variable that contains the range and than paste it back on the sheet)
- I want to keep the first instance of each columns ( If we have 2 identitcal columns we need to keep one of them )
- I dont want to check specific columns but the entire sheet
- I cannot provide an exemple but if more informations are needed I can answer you

Thanks a lot for your help :)
 

Excel Facts

Which came first: VisiCalc or Lotus 1-2-3?
Dan Bricklin and Bob Frankston debuted VisiCalc in 1979 as a Visible Calculator. Lotus 1-2-3 debuted in the early 1980's, from Mitch Kapor.
Hi,​
as Excel is not a database software so it can't be as fast as any database software …​
As there is no header - breaking Excel rules so it seems Dumb or Dumber was here ! - so obviously that slows down the process …​
Without any sample - as it may depend on how smart is the worksheet design and as you can provide a sample workbook -​
you must at least very well detail and elaborate your need …​
 
Upvote 0
Hello thank you for your answer !

Here is an example of my sheet with all the data types and the strucure (I didnt found better Idea to anonymize ? )
1621949440614.png
1621949440614.png
 
Upvote 0
Have you tried the built-in "Remove Duplicates" functionality that Excel has?
I am not sure if it can handle 10,000 columns worth of data, but I think that may be your best bet.

As was mentioned, Excel is not really a "database" software (like Access, SQL, or Oracle), so it may not handle this very well regardless of which method you try, for that amount of data.
Perhaps you may be able to do something with Power Query, as that kind of incorporate database logic to Excel workbooks.
 
Upvote 0
Hello Joe4,
It is part of an automated process so it needs to work on a macro.

I have this macro from a similar project that compares columns header and delete columns if there is a match. It is able to handle big amount of data and is pretty fast.
This example shows the kind of function I need :

VBA Code:
With CreateObject("scripting.dictionary")
   For Each Cl In Range("A1", Cells(1, Columns.Count).End(xlToLeft))
   truc = Cells(1, Columns.Count).End(xlToLeft)
      If Not .exists(Cl.Value) Then
         .Add Cl.Value, Nothing
      Else
         If rng Is Nothing Then Set rng = Cl Else Set rng = Union(rng, Cl)
      End If
   Next Cl
End With
If Not rng Is Nothing Then rng.EntireColumn.Delete

I tried to make it wrok to compare the whole columns but it didn't work. But maybe some VBA expert can halp me :D
 
Upvote 0
If you can link a workbook on a files host website like Dropbox for example …​
But don't howl at the Moon, the more columns, the more rows to check, the slower !​
 
Upvote 0
Where is this data coming from? I am willing to bet that you probably have already tried to see if there is a way to address this at the data source (i.e. why is it producing duplicate columns in the first place).

You mentioned that you may have 10000 columns of data. How many rows do you typically have?
If the number of rows is significantly less than the number of columns, perhaps it would be better to first transpose all the data (to make columns rows, and rows columns), use Excel's built-in "Remove Duplicates" functionality, and then transpose the data back.
 
Upvote 0
Solution
Marc L :
- I cannot provide the file as it is from my company and I cannot share it but with the example it shows how the file looks like (in terms of data structure).
- I know that increasing the data to check creates more time. This is why I am looking for a macro that could be faster than built in function.

Joe4 :
- The data comes from a db export then it is transformed through other macros and then I have to delete the duplicates, in this case it is normal to have duplicates from the process it is not an anomaly.
- I will have same amount of rows too but i succeed to remove the duplicates from them. I considerate transposing but its is not effective.
- Again I dont want to use built in function but it needs to go trough a macro so my user will "only" have to press a button.


The vba code example that I have linked before works great and needs a slight change (in my opinion) to check the whole column and not the first cells of each.

I'm still discovering vba coding but I believe it can handle such a task !

Thank you again for you time :)
 
Upvote 0
- Again I dont want to use built in function but it needs to go trough a macro so my user will "only" have to press a button.
Note that most "built-in" functions can be automated in VBA too! You can often get the code you need to do this by turning on the Macro Recorder, and record yourself performing that task manually, then stopping the Macro Recorder and viewing the code.
Many times, the "built-in" functionality will be faster and more efficient than any code you can come up with (and no sense in re-inventing the wheel).

The vba code example that I have linked before works great and needs a slight change (in my opinion) to check the whole column and not the first cells of each.
That "slight change" is actually a "VERY big change". Comparing "one cell to one cell" is very different than comparing "thousands of cells" to "thousands of cells", especially if the duplicate columns are not right next to each other. I suppose you could write loops to do it, but that will increase processing time exponentially, and could make your code very slow. I don't think I would try to take that approach.

Cleaning up "bad data" or "bad data structure" is often very cumbersome, especially when you have huge amounts of data.
I really cannot think of a good solution for this in Excel.

It might be interesting to see what the data originally looked like before it went through the "other transformations".
Perhaps it is in a more workable format there.
 
Upvote 0
Thank you everyone I re worked on the doc finally get a solution with transposing the table and transposing it back and it actually works with good speed performance.

I will close the topic
 
Upvote 0

Forum statistics

Threads
1,214,915
Messages
6,122,214
Members
449,074
Latest member
cancansova

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top