Excel to SQL - Handling Duplicate Values

mskusace

New Member
Joined
Jan 2, 2019
Messages
18
I have an Excel spreadsheet which I have been automating via VBA macros, but it appears the processing time has increased significantly because one worksheet has 222,465 rows of data. That worksheet will only accumulate more data, so I would like to convert this into an environment that can process the data quickly. Would SQL in Microsoft Access be ideal or is there a better solution? I have R, Python, Microsoft Access, and MySQL Workbench.

I have 3 workbooks I am pulling data from and I do perform some data manipulation.

Here are the formulas I currently have in my spreadsheet:

Column: OB
Code:
=SUBSTITUTE(SUBSTITUTE([@[OBN]]," ",""),"-","")
This cleans up the OBN column removing all hyphens and spaces to standardize it based on the data table.

Columb DO
Code:
=IF(OR([@[DON]]="NA",ISBLANK(TRIM([@[DON]])),[@[DON]]="N/A",ISNA([@[DON]]),ISBLANK([@[DON]])),"",[@[DON]])
This makes the DO column blank if there is no value there or if it's NA or N/A, as that is how the report displays it. Otherwise, use the value of DON as DO.

Column: OB+DO (This is the ID in the tables below)
Code:
=IF([@DO]="",[@OB],CONCATENATE([@OB],"-",[@DO]))
Concatenate columns OB and DO with a hyphen if DO contains a value that is not NA or N/A. Otherwise, just use OB.

Now, what I am trying to handle is in the post I made here https://www.mrexcel.com/forum/excel-questions/1085373-handling-duplicate-values.html.

I apologize for duplicating the thread, but the problem has expanded if I have to convert the entire process.
[HR][/HR]
Lookup Table in Worksheet 1.


IDCODEREASSIGN(CODE)
W12-05AAZZ
W12-05BBZZ
W14-01CCCC
W14-03AAAA
W15-01DDDD
W15-01DDDD


<tbody>
</tbody>
ID and CODE are generated from the report
REASSIGN(CODE) is what I am trying to accomplish. If there are multiple IDs (ex. W12-05) that are the same with different CODE values, then assign it to "ZZ" . However, if there are multiple IDs and they all have the same CODE values (ex. W15-01), then assign it to the same code value.
[HR][/HR]There are 2 Worksheets (Worksheet 2 and Worksheet 3) that house the ID and different data associated with that ID.

Data Table:


IDLOOKUP REASSIGN(CODE)
W12-05ZZ
W14-01CC
W14-03AA
W15-01DD


<tbody>
</tbody>
How can I accomplish the REASSIGN(CODE) column in the first table? Something along the lines of:
If IDs duplicate and multiple different CODEs, assign to ZZ.
If IDs duplicate and same codes, assign to same CODE.


There might be instances where if there certain multiple codes for duplicate IDs, it will need to assign to a specific CODE. "ZZ" is the default when I don't know where it should go and it can be handled by another department. Sometimes I know that code "EE" and code "FF" belong to one department and there is logic to assign it to "GG" for example.
[HR][/HR]I want to combine all 3 Worksheets based on the ID column and I want all the data from Worksheet 2 and Worksheet 3 to be there, even if there isn't a match in IDs. Worksheet 1 will be used just as a lookup to pull in the REASSIGN(CODE).

Please let me know if I need to explain further or if there are any questions. I am open to ideas and other solutions, since this is just the logic that came into my head. I am not sure if it's better to create another table for the REASSIGN(CODE) or if my logic is sound.
 

Some videos you may like

Excel Facts

Create a chart in one keystroke
Select the data and press Alt+F1 to insert a default chart. You can change the default chart to any chart type

mskusace

New Member
Joined
Jan 2, 2019
Messages
18
I had to move to SQL since Excel was not handling the data very well. I have it fully functioning now in SQL. If anyone is curious on the solution, I have posted it below.

SELECT [Table1].[ID], [Table1].
Code:
, Count([Table1].[CODE]) AS [CODE Count]
FROM [Table1]
GROUP BY [Table1].[ID], [Table1].[CODE];


I first do a count of CODE and group by ID and CODE. This gets me the count of each CODE per ID. For my example above, W12-05 would be listed twice and would have a count of 1 and 1, results in the table below.

Table2
[TABLE="width: 209"]
<tbody>[TR]
[TD]ID[/TD]
[TD]CODE[/TD]
[TD]CODE Count[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]AA[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]BB[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD]W14-01[/TD]
[TD]CC[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD]W14-03[/TD]
[TD]AA[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD]W15-01[/TD]
[TD]DD[/TD]
[TD="align: right"]2[/TD]
[/TR]
</tbody>[/TABLE]

Then, I do a count of ID's and grouped by ID.

SELECT [Table1].[ID], Count([Table1].[ID]) AS [ID Count]
FROM [Table1]
GROUP BY [Table1].[ID];

Table3
[TABLE="width: 500"]
<tbody>[TR]
[TD]ID[/TD]
[TD]ID Count[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]W14-01[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]W14-03[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]W15-01[/TD]
[TD]2[/TD]
[/TR]
</tbody>[/TABLE]

This will get me the total count of each ID.

Next, I compare the  ID count and CODE Count for each ID.

SELECT [Table2].[ID], [Table2].[CODE], [Table2].[CODE Count], [Table3].[Count ID]
FROM [Table2] LEFT JOIN [Table3] ON [Table2].[ID] = [Table3].[ID];


This yields the following table.

Table4
[TABLE="width: 500"]
<tbody>[TR]
[TD]ID[/TD]
[TD]CODE[/TD]
[TD]CODE Count[/TD]
[TD]ID Count[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]AA[/TD]
[TD]1[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]BB[/TD]
[TD]1[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]W14-01[/TD]
[TD]CC[/TD]
[TD]1[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]W14-03[/TD]
[TD]AA[/TD]
[TD]1[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]W15-01[/TD]
[TD]DD[/TD]
[TD]2[/TD]
[TD]2[/TD]
[/TR]
</tbody>[/TABLE]

The last step I do is compare the count of CODE and ID.

SELECT [Table4].[ID], [Table4].[CODE Count], [Table4].[Count ID], IIf([Table4].[CODE Count]<>[Table4].[Count ID],"ZZ", [Table4].CODE) AS CODE
FROM [Table4];


If the CODE Count and ID Count do not match, assign it to "ZZ".

Table5
[TABLE="width: 500"]
<tbody>[TR]
[TD]ID[/TD]
[TD]CODE Count[/TD]
[TD]ID Count[/TD]
[TD]CODE[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]1[/TD]
[TD]2[/TD]
[TD]ZZ[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]1[/TD]
[TD]2[/TD]
[TD]ZZ[/TD]
[/TR]
[TR]
[TD]W14-01[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]CC[/TD]
[/TR]
[TR]
[TD]W14-03[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]AA[/TD]
[/TR]
[TR]
[TD]W15-01[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]DD[/TD]
[/TR]
</tbody>[/TABLE]

From here, you can just take the DISTINCT values.

SELECT DISTINCT [Table5].[ID], [Table5].CODE
FROM [Table5];

[TABLE="width: 500"]
<tbody>[TR]
[TD]ID[/TD]
[TD]CODE[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]ZZ[/TD]
[/TR]
[TR]
[TD]W14-01[/TD]
[TD]CC[/TD]
[/TR]
[TR]
[TD]W14-03[/TD]
[TD]AA[/TD]
[/TR]
[TR]
[TD]W15-01[/TD]
[TD]DD[/TD]
[/TR]
</tbody>[/TABLE]



I am not sure if this is the optimal way, but it is a working solution for me. I hope this helps!

*EDIT*
Removed the [CODE] tags because I use [CODE] in my SQL and it was not displaying the code properly in the code block.
 
Last edited:

Forum statistics

Threads
1,089,296
Messages
5,407,437
Members
403,143
Latest member
CTremblay

This Week's Hot Topics

  • help please
    SORRY NOT ANY GOOD AT EXCEL SO HELP WOULD BE MUCH APPRECIATED this formula is in a sheet called ignore...
  • two formulas needed
    Hello, I'll try my best to explain this: First formula needed in Sheet1 cell A2: If Sheet1 cell B2 = Sheet2 cell B2 then return a 1. If not then...
  • Dynamic Counts
    Good afternoon, we are tidying up some data & the data seems to be growing quicker than we are tidying it up! What we confirm (by reviewing it...
  • Help Excel formula eliminate duplicate values and keep only 2 identical rows.
    as picture below column A has a duplicate value. but the values are not the same as the rule. sometimes 4 rows, sometimes 10 rows or 7 or 9...
  • Macro Compile Error Sub or Function not defined
    Hello, I am trying to run macros from a validation list, all macros have been created and run perfectly on there own but I'm getting a compile...
  • Last row combined with Current Region VBA
    I'm generally happy finding the last row of data through something like Lastrow = Cells(Rows.Count, "D").End(xlUp) but I don't always receive data...
Top