OK, here we go, I'll break the code up into chunks to try and explain the process, copy and paste each chunk sequentially into a code module. I'm assuming you will put this in your Source workbook (or you have a 3rd workbook, which is your process workbook, containing the code you've already written, which transfers the data from one to the other).
A quick note on the question about blanks - what I was looking for was confirmation of how to find the last row of data. I have continued with the assumption that any row with data on will have data in column B. So the process scans down column B to find the first blank cell, and takes that as the bottom of your list of data. If column B can be blank where the other columns are populated, then we'll need to tweak this to work with the strucuture of your data set.
The process is a little inefficient, as it works out where all the duplicates are, in order to give you the summary message you wanted, then with each pass to delete duplicates it has to rescan, because removing one set of duplicates makes the details gathered earlier obsolete. I couldn't think of a way that to gather the data that dynamically adjusts as the delete process runs through.
I've written this as a stand-alone Sub - you can add this to your workbook, and then call it from your existing routine. You could add parameters to it, e.g. workbook, worksheet, data start range, to me it more flexible and use objects you've already created for consistency - your call.
First we open the sub and declare all the objects and variables for use. If you're not passing the workbook/worksheet objects to the sub, make sure you change the names in the set statements in this section to reflect the construction of your workbook.
VBA Code:
Sub RemoveDuplicatesWithPrompt()
Dim wbDest As Workbook, shtDest As Worksheet, rCurrent As Range, rTest As Range, rLast As Range, rDupe As Range
Dim arrDupes() As String, i As Long, booAlreadyDuped As Boolean, lDupeRows As Long
Dim vResponse As Variant, vDupeRows As Variant, strMsg As String
'Setup base objects
Set wbDest = Workbooks("DestinationWorkbook")
Set shtDest = wbDest.Worksheets("DestinationWorksheet")
Set rCurrent = shtDest.Range("B2")
Do Until rCurrent.Offset(1, 0).Value = ""
Set rCurrent = rCurrent.Offset(1, 0)
Loop
Set rLast = rCurrent
ReDim arrDupes(3, 0)
rCurrent is the range we're going to look for duplicates of.
The process scans down column B to find the first blank cell, and sets rCurrent to be the one above (the last populated cell). We'll step backwards up the list to compile the set of duplicates.
Next we'll look at all the cells above that to find any with the same value. We're going to store the cell value, the address of the first cell with the value, the address of the last cell with the value, and the addresses of all the cells with the value in between as a group in a string array (
arrDupes). All addresses are bracketed by # symbols, so they can be distinguished readily.
VBA Code:
'Search all rows for duplicates
Do Until rCurrent.Row = 2
For i = 0 To UBound(arrDupes, 2)
If InStr(arrDupes(1, i) & arrDupes(2, i) & arrDupes(3, i), "#" & rCurrent.Address & "#") > 0 Then
booAlreadyDuped = True
End If
Next i
Set rTest = shtDest.Range("B2")
i = UBound(arrDupes, 2)
Do Until rTest.Address = rCurrent.Address Or booAlreadyDuped
If rTest.Value = rCurrent.Value Then
If arrDupes(0, i) = "" Then
arrDupes(0, i) = rTest.Value
arrDupes(1, i) = "#" & rTest.Address & "#"
arrDupes(3, i) = "#" & rCurrent.Address & "#"
ReDim Preserve arrDupes(3, i + 1)
Else
arrDupes(2, i) = arrDupes(2, i) & "#" & rTest.Address & "#"
End If
lDupeRows = lDupeRows + 1
End If
Set rTest = rTest.Offset(1, 0)
Loop
Set rCurrent = rCurrent.Offset(-1, 0)
booAlreadyDuped = False
Loop
There is one main loop, which steps
rCurrent up one row at a time, and two separate loops nested inside.
The first nested loop (
For i = 0 To UBound(arrDupes, 2)... Next i) checks whether the address of the
rCurrent cell has already been recorded as being a duplicate on an earlier pass. If it has been, the boolean value
booAlreadyDuped is set to true, and that skips the next loop.
The second nested loop (
Do Until rTest.Address = rCurrent.Address Or booAlreadyDuped...Loop) looks at each cell (
rTest) from the top of the list down to
rCurrent. If the values are the same (it's a duplicate), we increase the size of the second dimension of the array by one (
ReDim Preserve arrDupes(3, i + 1)), to accommodate the data about this duplicate. Then it records it in the string array (
arrDupes). The first time it matches (identified by looking at the first position in the first dimension of the array and seeing an empty string), the value is recorded in the first position of the first array, the address of
rTest (the first cell with the value - the one we're going to keep) in the second position, and the address of
rCurrent (the last cell with the value) in the final position of the first dimension. At the same time we're recording the number of duplicate values in the variable
lDupeRows, just so it's easy to refer to later when showing the summary message.
The next section is the long one, and deals with removing the duplicates.
VBA Code:
vResponse = MsgBox("There are " & UBound(arrDupes, 2) & " duplicate values in " & lDupeRows & " rows", vbOKCancel)
If vResponse = vbOK Then
For i = 0 To UBound(arrDupes, 2) - 1
Set rCurrent = shtDest.Range("B1").Resize(rLast.Row - 1, 1).Find(arrDupes(0, i), , xlValues, xlWhole)
Set rTest = rLast
arrDupes(2, i) = ""
arrDupes(3, i) = ""
Do Until rTest.Address = rCurrent.Address
If rTest.Value = rCurrent.Value Then
If arrDupes(3, i) = "" Then
arrDupes(3, i) = "#" & rTest.Address & "#"
Else
arrDupes(2, i) = "#" & rTest.Address & "#" & arrDupes(2, i)
End If
End If
Set rTest = rTest.Offset(-1, 0)
Loop
Set rTest = shtDest.Range(Replace(Replace(arrDupes(2, i) & arrDupes(3, i), "##", ","), "#", ""))
strMsg = "Row " & rCurrent.Row & " contains the value " & rCurrent.Value & " which is duplicated on " & rTest.Areas.Count & " rows;" & vbCrLf
For lDupeRows = 1 To rTest.Areas.Count
strMsg = strMsg & rTest.Areas(lDupeRows).Row & ","
Next lDupeRows
strMsg = Left(strMsg, Len(strMsg) - 1)
vResponse = MsgBox(strMsg, vbYesNo, "Duplicate values")
If vResponse = vbYes Then
For lDupeRows = rTest.Areas.Count To 1 Step -1
Set rDupe = rTest.Areas(lDupeRows)
rDupe.Offset(0, -1).Resize(rLast.Row - rDupe.Row, 4).Value = rDupe.Offset(1, -1).Resize(rLast.Row - rDupe.Row, 4).Value
rLast.Offset(0, -1).Resize(1, 4).ClearContents
Set rLast = rLast.Offset(-1, 0)
Next lDupeRows
Else
MsgBox "Skipping " & rTest.Areas.Count & " duplicates"
End If
Next i
End If
End Sub
First we display the summary message and ask the user to continue or not (OK/Cancel). If they respond with OK, we carry on, if not, then it's straight to the end of the Sub and exit.
Everything else is wrapped in a loop (
For i = 0 To UBound(arrDupes, 2) - 1...Next i) which steps through each position in the second dimension of the array (skipping the last one, as the last position is never populated by the first stage).
We now have to repeat much of what we did in the first stage, because as soon as we delete one set of duplicates the addresses we stored for the others are not longer correct.
We already know what values have duplicates in the list, as it's in the array.
The difference this time is we'll set
rCurrent to be the first instance of the value (looking down from the top) - we do this with the
.Find method of the Excel Range object which returns the first cell where a stated value is found in a searched range -
rTest to the bottom of the list and work up. This is because to 'delete' the duplicate, what we'll do is copy the values from all the rows below it up one, then clear the bottom row, rather than actually delete cells, as this means we don't have to worry about how that affects content in other areas of the worksheet (and thus replicates the built in Remove Duplicates functionality as closely as possible)
Having rebuilt the list of cells containing the duplicate value we display another message to the user and ask them for confirmation to go ahead with the delete.
If the response is "Yes", we run another loop for each cell containing the duplicate value, copying the values below up one, and clearing the last row. If not, we display a message to say we're skipping that duplicate value set, and move on to the next one.
This:
rDupe.Offset(0, -1).Resize(rLast.Row - rDupe.Row, 4).Value = rDupe.Offset(1, -1).Resize(rLast.Row - rDupe.Row, 4).Value is the command that copies the values up one.
rDupe.Offset(0,-1) - first we start with the cell one to the left (column A) of the cell with the duplicate value (column B)
.Resize(rLast.Row - rDupe.Row, 4) - then we expand the reference to contain the number of rows between the duplicate value and the last row, and stretch it over to cover 4 columns.
The other side of the equals is the same range, but dropped down one, thus copying all the values up one row.
That, then, should do the trick as requested.
Obviously it would have been a whole lot quicker and easier to just use the built in functionality, and not given the user any choice. That would have looked like this;
VBA Code:
Set rCurrent = shtDest.Range("B1").End(xlDown)
Set rCurrent = rCurrent.Offset(1 - rCurrent.Row, -1).Resize(rCurrent.Row, 4)
rCurrent.RemoveDuplicates Columns:=2, Header:=xlYes