Calculate Duplicate Frequency

MisterMeehan

New Member
Joined
Jun 22, 2015
Messages
10
Hi,


New poster here :wink:


I have a large dataset (~100,000 rows) where I am trying to count the number of duplicate values in a given column. (There are a lot!)


I would like to generate two columns showing which numbers are duplicates (column 1) and how many times (frequency) that each number is repeated (column 2). Please see example below.

How do I do that?



Sorry if this is an obvious thing to work out, but I have no experience with anything other than rudimentary tasks in Excel. Furthermore, I cannot see an example of how to do this in the archives.


I trust this can be calculated using a formula because I wouldn’t know how to apply any Visual Basic. Please advise how to work this out in easy steps.


Thanks for any help given.



Numbers

DuplicateNo.

Frequency

30

10

2

30

20

4

10

30

2

10

40

3

40

40

40

20

20

20

20


 

Excel Facts

How to calculate loan payments in Excel?
Use the PMT function: =PMT(5%/12,60,-25000) is for a $25,000 loan, 5% annual interest, 60 month loan.
I would suggest you try the COUNTIF or COUNTIFS functions, like this. Note that I've selected the entire column A; you can select the range A2 to A100000 or whatever if you like. Also, watch this and the other videos to learn everything you ever wanted to know about Excel's counting with criteria. https://www.youtube.com/watch?v=ohGgZ5ik4Xo


Excel 2012
ABC
1NumbersDuplicateNo.Frequency
230102
330204
410302
510403
640
740
840
920
1020
1120
1220
Sheet48
Cell Formulas
RangeFormula
C2=COUNTIFS(A:A,B2)
C3=COUNTIFS(A:A,B3)
C4=COUNTIFS(A:A,B4)
C5=COUNTIFS(A:A,B5)
 
Last edited:
Upvote 0
Given that you have a huge set, try to run a pivot table...

Row\Col
B​
C​
D​
E​
F​
1​
NumbersRow LabelsCount of Numbers
2​
30​
10
2​
3​
30​
20
4​
4​
10​
30
2​
5​
10​
40
3​
6​
40​
Grand Total
11​
7​
40​
8​
40​
9​
20​
10​
20​
11​
20​
12​
20​
13​
 
Upvote 0
Hi and thank you for that explanation and I looked at that video. But this is not giving me a solution. Perhaps I didn't explain it clearly.

I only have column A (telephone numbers). I need to generate column B (duplicate telephone numbers) and column C (the number of duplicates).

How would I go about that?

Thanks again.

PS. Another person suggested a pivot table. Would that help?
 
Upvote 0
Thanks for the suggestion of the pivot table. I am not sure how it helps, but I will try it.

How did you generate columns E and F? Because that is what I am trying to do. Did you use COUNTIFS to get those results?

Thanks
 
Upvote 0
Aladin's suggestion of pivot table (which is what he did for columns E and F) is the best option here because pivot tables automatically find all the duplicates and count them (and can do tons of other calculations). There is a way to use other complex formulas to create the list of duplcates and then Countifs to count them.
 
Upvote 0
100,000 phone numbers if you are state side the way numbers are formatted with area code and the exchange would only allow for a block of 10,000 number for one exchange (area code) xxx-0000, xxx-9999 and 100,000 would make it a large pivot table if you actually have 90,000 unique numbers with 10,000 duplicates. have you tried sorting small to large and looking at conditional formatting to get an idea of how many duplicates really have first. then perhaps rework the problem
 
Upvote 0
Okay thanks for the guidance on this. I will wait until I hear from Aladin on how he got that pivot table to catch all the duplicates. Hoping this task is not as difficult as it looks.

Cheers
 
Upvote 0
Yes, there are 100,000 phone numbers, and each row is meant to have a unique customer-telephone relationship, but as it turns out there are some 30,000 duplicates which is a huge number to deal with (Somebody tampered with the data... but that is another story).

As you can gather, I am not very pro with this, but have got the telephone column sorted so that there is a logical sequence whereby all duplicates are bunched together (but still interspersed amongst non-duplicates), so far so good, but now the task is to calculate and table which numbers are duplicated and how many times the duplication occurs for those numbers.

Seems simple... but am cracking me head on this :mad:

Thanks!

100,000 phone numbers if you are state side the way numbers are formatted with area code and the exchange would only allow for a block of 10,000 number for one exchange (area code) xxx-0000, xxx-9999 and 100,000 would make it a large pivot table if you actually have 90,000 unique numbers with 10,000 duplicates. have you tried sorting small to large and looking at conditional formatting to get an idea of how many duplicates really have first. then perhaps rework the problem
 
Upvote 0
are you wanting to remove the duplicates?
 
Upvote 0

Forum statistics

Threads
1,214,520
Messages
6,120,003
Members
448,935
Latest member
ijat

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top