VBA Data validation

foxozaur

New Member
Joined
Sep 24, 2019
Messages
11
Greetings MrExcel community,

I'm working on a VBA macro that allows me to send values from source workbook to destination workbook, with duplicates validation message box that provides two options - delete or retain. I'm done with the copy pasting values from source to destination files, however, I am struggling with the validating duplicates, the message box with two options "Keep or Delete" and removing duplicates within the destination file (the origin file does not matter in this case). A gentle reminder that this is a part of an existing macro!

Thanks in advance!

This is the sample file. I will lay every step I'd like to see in the macro.

This is the destination file with duplicates titled 'destinationfile.csv.'
1.PNG

I want the file to do the following:

1) validate total duplicates (I have built a SQL data feed) by locating the duplicate name
*No idea how to visualize it through excel, will use paint for that.
22222.png


2) pop a message box with a request to keep or delete (THIS IS ONLY FOR COLUMN B)
*No idea how to visualize it through excel, will use paint again.
Inked22_LI.jpg



3) The macro then deletes the selected ones, and displays the total deleted duplicates
2.PNG
 

Excel Facts

When did Power Query debut in Excel?
Although it was an add-in in Excel 2010 & Excel 2013, Power Query became a part of Excel in 2016, in Data, Get & Transform Data.
Define duplicate

Do all values in columns A to D need to match?
 
Upvote 0
Define duplicate

Do all values in columns A to D need to match?

We're only looking at duplicate - cell value that's identical to another cell - in column B. Basically, I consider a duplicate in the same manner Microsoft excel does, nothing to conceptualize here.


I'm not entirely sure what you mean by the value matches, could you clarify ?
 
Upvote 0
You want to delete rows that are duplicates - I want to confirm that you mean all the values on the two rows are identical, as you seem to imply that only one column is checked for duplicates.

Look at these three rows.

1580821884498.png


if you're only comparing the values in column B, then row 1 and row 3 are duplicates. If you are comparing all values in the row, they are not.
 
Upvote 0
FatBoyClam,

Thanks for clarifying. I'm implying that rows 1 and 3 (column B) are considered as duplicates. But the values in other columns can match and still not be considered as duplicates.
 
Upvote 0
FatBoyClam,

Thanks for clarifying. I'm implying that rows 1 and 3 (column B) are considered as duplicates. But the values in other columns can match and still not be considered as duplicates.
OK, thanks for that.


next question - how hung up are on you on having the buttons labelled "Keep" and "Delete"?

We can use a standard messagebox if your happy with "Yes" and "No" with a message like

Row x has a duplicate value (WhateverTheValueIs) to Row y.
Would you like to delete it?


Like this (using the 3 rows from my example earlier)

1580823636210.png


I'd write a procedure to scan down each row and compare it to the values in all the preceding rows, offering the user the option to delete each one as it goes along.

If the value appears on multiple rows, do you want a single message offering to delete *all* duplicates, or one for each row?

Do you want to delete the entire row from the spreadsheet, or just the 4 values, moving the contents of the cells below up (this is important if you have other things on your spreadsheet other than the table of data)

Also - last question on my mind - can I assume that there will be no blank cells in Column B (within the list of data)?
 
Upvote 0
OK, thanks for that.


next question - how hung up are on you on having the buttons labelled "Keep" and "Delete"?

We can use a standard messagebox if your happy with "Yes" and "No" with a message like

Row x has a duplicate value (WhateverTheValueIs) to Row y.
Would you like to delete it?


Like this (using the 3 rows from my example earlier)

View attachment 5678

I'd write a procedure to scan down each row and compare it to the values in all the preceding rows, offering the user the option to delete each one as it goes along.

If the value appears on multiple rows, do you want a single message offering to delete *all* duplicates, or one for each row?

Do you want to delete the entire row from the spreadsheet, or just the 4 values, moving the contents of the cells below up (this is important if you have other things on your spreadsheet other than the table of data)

Also - last question on my mind - can I assume that there will be no blank cells in Column B (within the list of data)?

FatBoyClam,

Thanks for helping out!

The answer to the first question would be - Absolutely not hung, Yes or No fits better.

Scanning every single row - it's exactly what I'm looking for. As for duplicates in multiple rows, it would awesome if we delete all the duplicates.

Answer to the second - I would like to consider empty rows, as I will continue populating these rows.

Would it be possible to receive a brief explanation for every step you took in order to retrieve the desired output? I want to keep learning VBA, seems like this is loads of fun.
 
Upvote 0
I'll have some code ready for you tomorrow - I've got to go out now.
 
Upvote 0
OK, here we go, I'll break the code up into chunks to try and explain the process, copy and paste each chunk sequentially into a code module. I'm assuming you will put this in your Source workbook (or you have a 3rd workbook, which is your process workbook, containing the code you've already written, which transfers the data from one to the other).

A quick note on the question about blanks - what I was looking for was confirmation of how to find the last row of data. I have continued with the assumption that any row with data on will have data in column B. So the process scans down column B to find the first blank cell, and takes that as the bottom of your list of data. If column B can be blank where the other columns are populated, then we'll need to tweak this to work with the strucuture of your data set.

The process is a little inefficient, as it works out where all the duplicates are, in order to give you the summary message you wanted, then with each pass to delete duplicates it has to rescan, because removing one set of duplicates makes the details gathered earlier obsolete. I couldn't think of a way that to gather the data that dynamically adjusts as the delete process runs through.

I've written this as a stand-alone Sub - you can add this to your workbook, and then call it from your existing routine. You could add parameters to it, e.g. workbook, worksheet, data start range, to me it more flexible and use objects you've already created for consistency - your call.

First we open the sub and declare all the objects and variables for use. If you're not passing the workbook/worksheet objects to the sub, make sure you change the names in the set statements in this section to reflect the construction of your workbook.

VBA Code:
Sub RemoveDuplicatesWithPrompt()

Dim wbDest As Workbook, shtDest As Worksheet, rCurrent As Range, rTest As Range, rLast As Range, rDupe As Range
Dim arrDupes() As String, i As Long, booAlreadyDuped As Boolean, lDupeRows As Long
Dim vResponse As Variant, vDupeRows As Variant, strMsg As String

'Setup base objects

Set wbDest = Workbooks("DestinationWorkbook")

Set shtDest = wbDest.Worksheets("DestinationWorksheet")

Set rCurrent = shtDest.Range("B2")
   
Do Until rCurrent.Offset(1, 0).Value = ""
    Set rCurrent = rCurrent.Offset(1, 0)
Loop

Set rLast = rCurrent

ReDim arrDupes(3, 0)

rCurrent is the range we're going to look for duplicates of.

The process scans down column B to find the first blank cell, and sets rCurrent to be the one above (the last populated cell). We'll step backwards up the list to compile the set of duplicates.

Next we'll look at all the cells above that to find any with the same value. We're going to store the cell value, the address of the first cell with the value, the address of the last cell with the value, and the addresses of all the cells with the value in between as a group in a string array (arrDupes). All addresses are bracketed by # symbols, so they can be distinguished readily.

VBA Code:
'Search all rows for duplicates

Do Until rCurrent.Row = 2

    For i = 0 To UBound(arrDupes, 2)
        If InStr(arrDupes(1, i) & arrDupes(2, i) & arrDupes(3, i), "#" & rCurrent.Address & "#") > 0 Then
            booAlreadyDuped = True
        
        End If
    Next i
    
    Set rTest = shtDest.Range("B2")
    i = UBound(arrDupes, 2)
    
    Do Until rTest.Address = rCurrent.Address Or booAlreadyDuped
        
       If rTest.Value = rCurrent.Value Then
        
            If arrDupes(0, i) = "" Then
                arrDupes(0, i) = rTest.Value
                arrDupes(1, i) = "#" & rTest.Address & "#"
                arrDupes(3, i) = "#" & rCurrent.Address & "#"
                
                ReDim Preserve arrDupes(3, i + 1)
                
            Else
                
                arrDupes(2, i) = arrDupes(2, i) & "#" & rTest.Address & "#"
            End If
        
            lDupeRows = lDupeRows + 1
        
        End If
        
        Set rTest = rTest.Offset(1, 0)
    Loop
    
    Set rCurrent = rCurrent.Offset(-1, 0)
    booAlreadyDuped = False
Loop

There is one main loop, which steps rCurrent up one row at a time, and two separate loops nested inside.

The first nested loop (For i = 0 To UBound(arrDupes, 2)... Next i) checks whether the address of the rCurrent cell has already been recorded as being a duplicate on an earlier pass. If it has been, the boolean value booAlreadyDuped is set to true, and that skips the next loop.

The second nested loop (Do Until rTest.Address = rCurrent.Address Or booAlreadyDuped...Loop) looks at each cell (rTest) from the top of the list down to rCurrent. If the values are the same (it's a duplicate), we increase the size of the second dimension of the array by one (ReDim Preserve arrDupes(3, i + 1)), to accommodate the data about this duplicate. Then it records it in the string array (arrDupes). The first time it matches (identified by looking at the first position in the first dimension of the array and seeing an empty string), the value is recorded in the first position of the first array, the address of rTest (the first cell with the value - the one we're going to keep) in the second position, and the address of rCurrent (the last cell with the value) in the final position of the first dimension. At the same time we're recording the number of duplicate values in the variable lDupeRows, just so it's easy to refer to later when showing the summary message.

The next section is the long one, and deals with removing the duplicates.

VBA Code:
vResponse = MsgBox("There are " & UBound(arrDupes, 2) & " duplicate values in " & lDupeRows & " rows", vbOKCancel)

If vResponse = vbOK Then

    For i = 0 To UBound(arrDupes, 2) - 1
    
        Set rCurrent = shtDest.Range("B1").Resize(rLast.Row - 1, 1).Find(arrDupes(0, i), , xlValues, xlWhole)
        
        Set rTest = rLast
        
        arrDupes(2, i) = ""
        arrDupes(3, i) = ""
        
        Do Until rTest.Address = rCurrent.Address
            
           If rTest.Value = rCurrent.Value Then
            
                If arrDupes(3, i) = "" Then
                    
                    arrDupes(3, i) = "#" & rTest.Address & "#"
                    
                Else
                    
                    arrDupes(2, i) = "#" & rTest.Address & "#" & arrDupes(2, i)
                End If
            
            End If
            
            Set rTest = rTest.Offset(-1, 0)
            
            
        Loop
        
        
        Set rTest = shtDest.Range(Replace(Replace(arrDupes(2, i) & arrDupes(3, i), "##", ","), "#", ""))
                
        strMsg = "Row " & rCurrent.Row & " contains the value " & rCurrent.Value & " which is duplicated on " & rTest.Areas.Count & " rows;" & vbCrLf
        
        For lDupeRows = 1 To rTest.Areas.Count
        
            strMsg = strMsg & rTest.Areas(lDupeRows).Row & ","
        
        Next lDupeRows
        
        strMsg = Left(strMsg, Len(strMsg) - 1)
    
        vResponse = MsgBox(strMsg, vbYesNo, "Duplicate values")
    
        If vResponse = vbYes Then
        
            For lDupeRows = rTest.Areas.Count To 1 Step -1
        
                Set rDupe = rTest.Areas(lDupeRows)
                
                rDupe.Offset(0, -1).Resize(rLast.Row - rDupe.Row, 4).Value = rDupe.Offset(1, -1).Resize(rLast.Row - rDupe.Row, 4).Value
            
                rLast.Offset(0, -1).Resize(1, 4).ClearContents
                Set rLast = rLast.Offset(-1, 0)
            
            Next lDupeRows
            
        
        Else
            MsgBox "Skipping " & rTest.Areas.Count & " duplicates"
        End If
        
    Next i


End If



End Sub

First we display the summary message and ask the user to continue or not (OK/Cancel). If they respond with OK, we carry on, if not, then it's straight to the end of the Sub and exit.

Everything else is wrapped in a loop (For i = 0 To UBound(arrDupes, 2) - 1...Next i) which steps through each position in the second dimension of the array (skipping the last one, as the last position is never populated by the first stage).

We now have to repeat much of what we did in the first stage, because as soon as we delete one set of duplicates the addresses we stored for the others are not longer correct.

We already know what values have duplicates in the list, as it's in the array.

The difference this time is we'll set rCurrent to be the first instance of the value (looking down from the top) - we do this with the .Find method of the Excel Range object which returns the first cell where a stated value is found in a searched range - rTest to the bottom of the list and work up. This is because to 'delete' the duplicate, what we'll do is copy the values from all the rows below it up one, then clear the bottom row, rather than actually delete cells, as this means we don't have to worry about how that affects content in other areas of the worksheet (and thus replicates the built in Remove Duplicates functionality as closely as possible)

Having rebuilt the list of cells containing the duplicate value we display another message to the user and ask them for confirmation to go ahead with the delete.

If the response is "Yes", we run another loop for each cell containing the duplicate value, copying the values below up one, and clearing the last row. If not, we display a message to say we're skipping that duplicate value set, and move on to the next one.

This: rDupe.Offset(0, -1).Resize(rLast.Row - rDupe.Row, 4).Value = rDupe.Offset(1, -1).Resize(rLast.Row - rDupe.Row, 4).Value is the command that copies the values up one.
rDupe.Offset(0,-1) - first we start with the cell one to the left (column A) of the cell with the duplicate value (column B)
.Resize(rLast.Row - rDupe.Row, 4) - then we expand the reference to contain the number of rows between the duplicate value and the last row, and stretch it over to cover 4 columns.
The other side of the equals is the same range, but dropped down one, thus copying all the values up one row.

That, then, should do the trick as requested.

Obviously it would have been a whole lot quicker and easier to just use the built in functionality, and not given the user any choice. That would have looked like this;

VBA Code:
    Set rCurrent = shtDest.Range("B1").End(xlDown)
    Set rCurrent = rCurrent.Offset(1 - rCurrent.Row, -1).Resize(rCurrent.Row, 4)
    
    rCurrent.RemoveDuplicates Columns:=2, Header:=xlYes
 
Upvote 0
FatBoyClam,


I cannot thank you hard enough for taking the time to script down and make a proper breakdown of the script!
 
Upvote 0

Forum statistics

Threads
1,214,851
Messages
6,121,931
Members
449,056
Latest member
denissimo

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top