Extract Duplicate Data to another sheet from huge data set

earthworm

Well-known Member
Joined
May 19, 2009
Messages
759
Office Version
  1. 2019
  2. 2016
Platform
  1. Windows
I have a huge data around 170000 rows . There is one column example B:B that contains value through which I can identify or remove duplicates .

Since data is huge i cannot apply countif or advance filter because it slows down the file. So anyone please help me to write a VBA code through which

duplicates can be identified and then copy to another sheet . That is if there is any value in a row that occurs more then once must be extracted to another sheet example sheet2.
 

Excel Facts

Which came first: VisiCalc or Lotus 1-2-3?
Dan Bricklin and Bob Frankston debuted VisiCalc in 1979 as a Visible Calculator. Lotus 1-2-3 debuted in the early 1980's, from Mitch Kapor.
Could you give us a small set of sample data, say 10-15 rows and 3 columns (if you have at least 3 columns in your data) and the expected results with XL2BB ?

I also suggest that you update your Account details (click your user name at the top right of the forum) so helpers always know what Excel version(s) & platform(s) you are using as the best solution often varies by version. (Don’t forget to scroll down & ‘Save’)
 
Upvote 0
Could you give us a small set of sample data, say 10-15 rows and 3 columns (if you have at least 3 columns in your data) and the expected results with XL2BB ?

I also suggest that you update your Account details (click your user name at the top right of the forum) so helpers always know what Excel version(s) & platform(s) you are using as the best solution often varies by version. (Don’t forget to scroll down & ‘Save’)

Suppose data starts from Cell A2 and below is the data. I need to extract duplicate to another sheet . If there is any data after A2 example B2 i also need to extract . We will do this part later . Lets start with simple one for my underrstanding

syed mehmood ahmed shahsyed amir ahmed shah44008Aftab Currency Exchange UK
FAISEL ASGHARMUSTAFA AZAM MUHAMMAD AZAM44011Aftab Currency Exchange UK
FAISEL ASGHARMUSTAFA AZAM MUHAMMAD AZAM44011Aftab Currency Exchange UK
Muhammad Umair RafiqueCh Ghulam Mustafa Ch Said Ahmed44012Aftab Currency Exchange UK
MOHAMMAD AWAIS SHARIFMIAN MOHAMMAD ASIF HAYAT44012Aftab Currency Exchange UK
MOHAMMED UMAR FarooqMuhammad Farooq44012Aftab Currency Exchange UK
MUHAMMAD FAHEEMjaved Iqbal44012Aftab Currency Exchange UK
ZAFAR IQBALShafia naheed Zafar44012Aftab Currency Exchange UK
Tariq MahmoodMuhammad Yasir44012Aftab Currency Exchange UK
Noman AnwarMUHAMMAD ANIS44012Aftab Currency Exchange UK
Sharjeel SadaqatSULEMAN KHAN DANISH44012Aftab Currency Exchange UK


I am using Windows Office Professional Plus 2016 X64
 
Upvote 0
syed mehmood ahmed shahsyed amir ahmed shah44008Aftab Currency Exchange UK
FAISEL ASGHARMUSTAFA AZAM MUHAMMAD AZAM44011Aftab Currency Exchange UK
FAISEL ASGHARMUSTAFA AZAM MUHAMMAD AZAM44011Aftab Currency Exchange UK

Muhammad Umair RafiqueCh Ghulam Mustafa Ch Said Ahmed44012Aftab Currency Exchange UK
MOHAMMAD AWAIS SHARIFMIAN MOHAMMAD ASIF HAYAT44012Aftab Currency Exchange UK
MOHAMMED UMAR FarooqMuhammad Farooq44012Aftab Currency Exchange UK
MUHAMMAD FAHEEMjaved Iqbal44012Aftab Currency Exchange UK
ZAFAR IQBALShafia naheed Zafar44012Aftab Currency Exchange UK
Tariq MahmoodMuhammad Yasir44012Aftab Currency Exchange UK
Noman AnwarMUHAMMAD ANIS44012Aftab Currency Exchange UK
Sharjeel SadaqatSULEMAN KHAN DANISH44012Aftab Currency Exchange UK
So there is just one pair of duplicates in that sample. Should both of those rows be extracted to the other sheet or just one? Since I don't know what is in the other columns I don't know if the whole row will be duplicated or just the value in this column.
 
Upvote 0
So there is just one pair of duplicates in that sample. Should both of those rows be extracted to the other sheet or just one? Since I don't know what is in the other columns I don't know if the whole row will be duplicated or just the value in this column.

Yes, Any similar instances which you have highlighted must be pasted to another sheet that is both rows. Whatever the value is appearing in another column example B2 or C2 must also be pasted . I will customize this part later on. However My main criteria is in cell A1 only .

Please also note that my data is huge around 180000 hence i wont suggest infinite range because that will slow down the whole system and worksheet. You can specifiy define rows and columns that can be customize easily . Once done i will bother you to explain me the concept if you dont mind . :) You need to teach me like i am baby VBA part specially Dim , Int and Var part fly over my head .
 
Upvote 0
I tried pivot table technique by throwing the values in row and count and apply filer count > 1 . However when I am double clicking the data excel is not showing filtered content . Instead it is showing entire content . :(
 
Upvote 0
I don't have any representative data as big as yours but give this a try in a copy of your workbook. My sample data is on a sheet called "Data". Edit the 'Const' line near the top of the code to match your sheet name before running it.

Thanks for updating your profile.

VBA Code:
Sub Extract_Dupes()
  Dim d As Object
  Dim a As Variant, b As Variant
  Dim nc As Long, i As Long, k As Long
  
  Const MyDataSheetName As String = "Data"  '<- Edit to suit
 
  Set d = CreateObject("Scripting.Dictionary")
  d.CompareMode = 1
  With Sheets(MyDataSheetName)
    nc = .Cells.Find(What:="*", LookIn:=xlFormulas, SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column + 1
    a = .Range("A2", .Range("A" & .Rows.Count).End(xlUp)).Value
  End With
  ReDim b(1 To UBound(a), 1 To 1)
  For i = 1 To UBound(a)
    d(a(i, 1)) = d(a(i, 1)) + 1
  Next i
  For i = 1 To UBound(a)
    If d(a(i, 1)) = 1 Then
      b(i, 1) = 1
      k = k + 1
    End If
  Next i
  If k > 0 Then
    Application.ScreenUpdating = False
    Sheets(MyDataSheetName).Copy After:=Sheets(MyDataSheetName)
    With Sheets(Sheets(MyDataSheetName).Index + 1).Range("A2").Resize(UBound(a), nc)
      .Columns(nc).Value = b
      .Sort Key1:=.Columns(nc), Order1:=xlAscending, Header:=xlNo
      .Resize(k).EntireRow.Delete
    End With
    Application.ScreenUpdating = True
  End If
End Sub
 
Upvote 0
another approach with Power Query

RawRaw
syed mehmood ahmed shahsyed amir ahmed shah44008Aftab Currency Exchange UKFAISEL ASGHARMUSTAFA AZAM MUHAMMAD AZAM44011Aftab Currency Exchange UK
FAISEL ASGHARMUSTAFA AZAM MUHAMMAD AZAM44011Aftab Currency Exchange UKFAISEL ASGHARMUSTAFA AZAM MUHAMMAD AZAM44011Aftab Currency Exchange UK
FAISEL ASGHARMUSTAFA AZAM MUHAMMAD AZAM44011Aftab Currency Exchange UK
Muhammad Umair RafiqueCh Ghulam Mustafa Ch Said Ahmed44012Aftab Currency Exchange UK
MOHAMMAD AWAIS SHARIFMIAN MOHAMMAD ASIF HAYAT44012Aftab Currency Exchange UK
MOHAMMED UMAR FarooqMuhammad Farooq44012Aftab Currency Exchange UK
MUHAMMAD FAHEEMjaved Iqbal44012Aftab Currency Exchange UK
ZAFAR IQBALShafia naheed Zafar44012Aftab Currency Exchange UK
Tariq MahmoodMuhammad Yasir44012Aftab Currency Exchange UK
Noman AnwarMUHAMMAD ANIS44012Aftab Currency Exchange UK
Sharjeel SadaqatSULEMAN KHAN DANISH44012Aftab Currency Exchange UK

Rich (BB code):
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    KeptDup = let columnNames = {"Raw"}, addCount = Table.Group(Source, columnNames, {{"Count", Table.RowCount, type number}}), selectDuplicates = Table.SelectRows(addCount, each [Count] > 1), removeCount = Table.RemoveColumns(selectDuplicates, "Count") in Table.Join(Source, columnNames, removeCount, columnNames, JoinKind.Inner)
in
    KeptDup
 
Upvote 0
I don't have any representative data as big as yours but give this a try in a copy of your workbook. My sample data is on a sheet called "Data". Edit the 'Const' line near the top of the code to match your sheet name before running it.

Thanks for updating your profile.

VBA Code:
Sub Extract_Dupes()
  Dim d As Object
  Dim a As Variant, b As Variant
  Dim nc As Long, i As Long, k As Long
 
  Const MyDataSheetName As String = "Data"  '<- Edit to suit

  Set d = CreateObject("Scripting.Dictionary")
  d.CompareMode = 1
  With Sheets(MyDataSheetName)
    nc = .Cells.Find(What:="*", LookIn:=xlFormulas, SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column + 1
    a = .Range("A2", .Range("A" & .Rows.Count).End(xlUp)).Value
  End With
  ReDim b(1 To UBound(a), 1 To 1)
  For i = 1 To UBound(a)
    d(a(i, 1)) = d(a(i, 1)) + 1
  Next i
  For i = 1 To UBound(a)
    If d(a(i, 1)) = 1 Then
      b(i, 1) = 1
      k = k + 1
    End If
  Next i
  If k > 0 Then
    Application.ScreenUpdating = False
    Sheets(MyDataSheetName).Copy After:=Sheets(MyDataSheetName)
    With Sheets(Sheets(MyDataSheetName).Index + 1).Range("A2").Resize(UBound(a), nc)
      .Columns(nc).Value = b
      .Sort Key1:=.Columns(nc), Order1:=xlAscending, Header:=xlNo
      .Resize(k).EntireRow.Delete
    End With
    Application.ScreenUpdating = True
  End If
End Sub

AMAZING ! IT WORKED PERFECTLY . PLEASE TEACH ME HOW YOU DID THAT . I dont understand the code :( .
 
Upvote 0
Glad it helped.

PLEASE TEACH ME HOW YOU DID THAT . I dont understand the code
Hard to condense 15-20 years of learning into a few sentences. ;)

The gist of the code is ..
  • copy the whole sheet
  • read all the column-of-interest values into memory
  • loop through all those values and count them
  • loop through them again and if the count is 1 note that row
  • go back to the sheet and mark all the 'noted' rows
  • sort all the data so all those single-count values are together
  • delete all the single-count rows
 
Upvote 0

Forum statistics

Threads
1,214,788
Messages
6,121,603
Members
449,038
Latest member
Arbind kumar

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top