Excel to SQL - Handling Duplicate Values

mskusace

New Member
Joined
Jan 2, 2019
Messages
18
I have an Excel spreadsheet which I have been automating via VBA macros, but it appears the processing time has increased significantly because one worksheet has 222,465 rows of data. That worksheet will only accumulate more data, so I would like to convert this into an environment that can process the data quickly. Would SQL in Microsoft Access be ideal or is there a better solution? I have R, Python, Microsoft Access, and MySQL Workbench.

I have 3 workbooks I am pulling data from and I do perform some data manipulation.

Here are the formulas I currently have in my spreadsheet:

Column: OB
Code:
=SUBSTITUTE(SUBSTITUTE([@[OBN]]," ",""),"-","")
This cleans up the OBN column removing all hyphens and spaces to standardize it based on the data table.

Columb DO
Code:
=IF(OR([@[DON]]="NA",ISBLANK(TRIM([@[DON]])),[@[DON]]="N/A",ISNA([@[DON]]),ISBLANK([@[DON]])),"",[@[DON]])
This makes the DO column blank if there is no value there or if it's NA or N/A, as that is how the report displays it. Otherwise, use the value of DON as DO.

Column: OB+DO (This is the ID in the tables below)
Code:
=IF([@DO]="",[@OB],CONCATENATE([@OB],"-",[@DO]))
Concatenate columns OB and DO with a hyphen if DO contains a value that is not NA or N/A. Otherwise, just use OB.

Now, what I am trying to handle is in the post I made here https://www.mrexcel.com/forum/excel-questions/1085373-handling-duplicate-values.html.

I apologize for duplicating the thread, but the problem has expanded if I have to convert the entire process.

Lookup Table in Worksheet 1.


IDCODEREASSIGN(CODE)
W12-05AAZZ
W12-05BBZZ
W14-01CCCC
W14-03AAAA
W15-01DDDD
W15-01DDDD


<tbody>
</tbody>
ID and CODE are generated from the report
REASSIGN(CODE) is what I am trying to accomplish. If there are multiple IDs (ex. W12-05) that are the same with different CODE values, then assign it to "ZZ" . However, if there are multiple IDs and they all have the same CODE values (ex. W15-01), then assign it to the same code value.

There are 2 Worksheets (Worksheet 2 and Worksheet 3) that house the ID and different data associated with that ID.

Data Table:


IDLOOKUP REASSIGN(CODE)
W12-05ZZ
W14-01CC
W14-03AA
W15-01DD


<tbody>
</tbody>
How can I accomplish the REASSIGN(CODE) column in the first table? Something along the lines of:
If IDs duplicate and multiple different CODEs, assign to ZZ.
If IDs duplicate and same codes, assign to same CODE.


There might be instances where if there certain multiple codes for duplicate IDs, it will need to assign to a specific CODE. "ZZ" is the default when I don't know where it should go and it can be handled by another department. Sometimes I know that code "EE" and code "FF" belong to one department and there is logic to assign it to "GG" for example.

I want to combine all 3 Worksheets based on the ID column and I want all the data from Worksheet 2 and Worksheet 3 to be there, even if there isn't a match in IDs. Worksheet 1 will be used just as a lookup to pull in the REASSIGN(CODE).

Please let me know if I need to explain further or if there are any questions. I am open to ideas and other solutions, since this is just the logic that came into my head. I am not sure if it's better to create another table for the REASSIGN(CODE) or if my logic is sound.
 

Excel Facts

Move date out one month or year
Use =EDATE(A2,1) for one month later. Use EDATE(A2,12) for one year later.
I had to move to SQL since Excel was not handling the data very well. I have it fully functioning now in SQL. If anyone is curious on the solution, I have posted it below.

SELECT [Table1].[ID], [Table1].
Code:
, Count([Table1].[CODE]) AS [CODE Count]
FROM [Table1]
GROUP BY [Table1].[ID], [Table1].[CODE];


I first do a count of CODE and group by ID and CODE. This gets me the count of each CODE per ID. For my example above, W12-05 would be listed twice and would have a count of 1 and 1, results in the table below.

Table2
[TABLE="width: 209"]
<tbody>[TR]
[TD]ID[/TD]
[TD]CODE[/TD]
[TD]CODE Count[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]AA[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]BB[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD]W14-01[/TD]
[TD]CC[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD]W14-03[/TD]
[TD]AA[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD]W15-01[/TD]
[TD]DD[/TD]
[TD="align: right"]2[/TD]
[/TR]
</tbody>[/TABLE]

Then, I do a count of ID's and grouped by ID.

SELECT [Table1].[ID], Count([Table1].[ID]) AS [ID Count]
FROM [Table1]
GROUP BY [Table1].[ID];

Table3
[TABLE="width: 500"]
<tbody>[TR]
[TD]ID[/TD]
[TD]ID Count[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]W14-01[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]W14-03[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]W15-01[/TD]
[TD]2[/TD]
[/TR]
</tbody>[/TABLE]

This will get me the total count of each ID.

Next, I compare the  ID count and CODE Count for each ID.

SELECT [Table2].[ID], [Table2].[CODE], [Table2].[CODE Count], [Table3].[Count ID]
FROM [Table2] LEFT JOIN [Table3] ON [Table2].[ID] = [Table3].[ID];


This yields the following table.

Table4
[TABLE="width: 500"]
<tbody>[TR]
[TD]ID[/TD]
[TD]CODE[/TD]
[TD]CODE Count[/TD]
[TD]ID Count[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]AA[/TD]
[TD]1[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]BB[/TD]
[TD]1[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]W14-01[/TD]
[TD]CC[/TD]
[TD]1[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]W14-03[/TD]
[TD]AA[/TD]
[TD]1[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]W15-01[/TD]
[TD]DD[/TD]
[TD]2[/TD]
[TD]2[/TD]
[/TR]
</tbody>[/TABLE]

The last step I do is compare the count of CODE and ID.

SELECT [Table4].[ID], [Table4].[CODE Count], [Table4].[Count ID], IIf([Table4].[CODE Count]<>[Table4].[Count ID],"ZZ", [Table4].CODE) AS CODE
FROM [Table4];


If the CODE Count and ID Count do not match, assign it to "ZZ".

Table5
[TABLE="width: 500"]
<tbody>[TR]
[TD]ID[/TD]
[TD]CODE Count[/TD]
[TD]ID Count[/TD]
[TD]CODE[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]1[/TD]
[TD]2[/TD]
[TD]ZZ[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]1[/TD]
[TD]2[/TD]
[TD]ZZ[/TD]
[/TR]
[TR]
[TD]W14-01[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]CC[/TD]
[/TR]
[TR]
[TD]W14-03[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]AA[/TD]
[/TR]
[TR]
[TD]W15-01[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]DD[/TD]
[/TR]
</tbody>[/TABLE]

From here, you can just take the DISTINCT values.

SELECT DISTINCT [Table5].[ID], [Table5].CODE
FROM [Table5];

[TABLE="width: 500"]
<tbody>[TR]
[TD]ID[/TD]
[TD]CODE[/TD]
[/TR]
[TR]
[TD]W12-05[/TD]
[TD]ZZ[/TD]
[/TR]
[TR]
[TD]W14-01[/TD]
[TD]CC[/TD]
[/TR]
[TR]
[TD]W14-03[/TD]
[TD]AA[/TD]
[/TR]
[TR]
[TD]W15-01[/TD]
[TD]DD[/TD]
[/TR]
</tbody>[/TABLE]



I am not sure if this is the optimal way, but it is a working solution for me. I hope this helps!

*EDIT*
Removed the [CODE] tags because I use [CODE] in my SQL and it was not displaying the code properly in the code block.
 
Last edited:
Upvote 0

Forum statistics

Threads
1,213,517
Messages
6,114,089
Members
448,548
Latest member
harryls

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top