Aggregating records with date-stamps into years (or months)

sdspieg

Board Regular
Joined
Dec 9, 2013
Messages
86
I have a spreadsheet with 3 columns (the first one has a date like '20100224', the second one a text string like 'european union' and the third one a numeric string with a value like '3') and 250'000 rows. Is there a way to aggregate and reorganize these data in such a way that I get sthg like this - i.e. with all dates aggregated into years, and those years then displayed vertically in declining order and horizontally next to each other first a column with the top text string for year x and in the next column the sum of all the values for that text string for that year; and then in the next two columns the same but then for year x+1, etc.?
 

Some videos you may like

Excel Facts

Select a hidden cell
Somehide hide payroll data in column G? Press F5. Type G1. Enter. Look in formula bar while you arrow down through G.

Momentman

Well-known Member
Joined
Jan 11, 2012
Messages
4,066
Office Version
  1. 365
Platform
  1. Windows
  2. MacOS
Sample data?
 
Last edited:

sdspieg

Board Regular
Joined
Dec 9, 2013
Messages
86
Would you mind posting sample data - before and after to make your request clearer?

Of course not. I had added links to a google spreadsheet in my message, but maybe I should have made that clearer :)

-Stephan
 

cyrilbrd

Well-known Member
Joined
Feb 2, 2012
Messages
4,071
Office Version
  1. 365
Platform
  1. Windows
  2. Mobile

ADVERTISEMENT

Hi,
dates are from 1956 to present (maybe greater range), do you want to see all dates?
do you consider the first word or letter in the string? what is it is a date?
values should be summed up or counted?
 

sdspieg

Board Regular
Joined
Dec 9, 2013
Messages
86
Hi,
dates are from 1956 to present (maybe greater range), do you want to see all dates?
Well in this file it does go back that far. For some other cases, it only goes back 10 years. What you see here are the textmining results for a number of official government websites of different countries. The second column contains the actual words we're interested in. The first column is the date of publication of the webpage. And the third column gives a numerical value for the importance of the word(s) in the second column. To be honest, I don't know what the most intuitive way is to present these data. All I know is that our analysts want to be able to analyze the main changes from year to year. And they definitely can't do that in such a format. But in a format like the one I suggested, they might be able to (at least for the past 10 years let's say). But if you have any better suggestions, I'd of course love to hear them!
do you consider the first word or letter in the string? what is it is a date?
Well, most of the time the second column is a word of acombination of 2 or three words. But yes, sometimes it's just letters like 'h e' (standing for 'his (or her) excellency'
values should be summed up or counted?
Summed up.
 

cyrilbrd

Well-known Member
Joined
Feb 2, 2012
Messages
4,071
Office Version
  1. 365
Platform
  1. Windows
  2. Mobile

ADVERTISEMENT

so 'hereinafter referred' would become 'hereinafter' only and 'european commission government' would become 'european' ? or would there be something more to it?
If the value is in correlation with the wording, and if it is in correlation with the importance, hence a 15 is ranking higher than a 7?
based on the following:
h e
contracts
h e mr.
e mr.
h e
h e dr.
h e mr.
h. e. mr.

<colgroup><col></colgroup><tbody>
</tbody>

<colgroup><col></colgroup><tbody>
</tbody>
what would be the 'words' to be analyzed here?

And here:
(europe
(europe west):
(europe west): yes,

<colgroup><col></colgroup><tbody>
</tbody>
would those be the same?

Extraction of the year is not a problem, extraction of the first string either but it would be better to confirm length and importance of said string.
 

sdspieg

Board Regular
Joined
Dec 9, 2013
Messages
86
so 'hereinafter referred' would become 'hereinafter' only and 'european commission government' would become 'european' ? or would there be something more to it?
If the value is in correlation with the wording, and if it is in correlation with the importance, hence a 15 is ranking higher than a 7?
based on the following:
h e
contracts
h e mr.
e mr.
h e
h e dr.
h e mr.
h. e. mr.

<tbody>
</tbody>

<tbody>
</tbody>
what would be the 'words' to be analyzed here?

And here:
(europe
(europe west):
(europe west): yes,

<tbody>
</tbody>
would those be the same?

Extraction of the year is not a problem, extraction of the first string either but it would be better to confirm length and importance of said string.
I'm trying to understand this. Is it hard to just use the entire string, whatever it may be? Point is - the system we set up identifies unigrams (one word), bigrams (two words) and trigrams (three). It ranks them all in declining order. So yes, we have europe and europe west and maybe even a trigram with europe. We are still working on getting rid of things like ( and : or , ). But my main question is whether it is possible to just aggregate ALL n-grams (whatever they may be - i.e. we'd want to see both Europe AND 'European Union' AND 'European Union problems') year by year.
 

cyrilbrd

Well-known Member
Joined
Feb 2, 2012
Messages
4,071
Office Version
  1. 365
Platform
  1. Windows
  2. Mobile
Noted to consider all data as 'unique'.
I looked only at 40t data instead of the whole range, considering 'european' we have up to 35 entries starting with that word and 127 occurrences of this word within a string... should those be unique or grouped?
if unique a pivot table will give you a recap of each unique data per year and will sum the corresponding value accordingly. Advantage is that you can select the range of years as necessary (note those 40t rows range from 1947 to 2012).
does this make sense to you?
 

sdspieg

Board Regular
Joined
Dec 9, 2013
Messages
86
Noted to consider all data as 'unique'.
I looked only at 40t data instead of the whole range, considering 'european' we have up to 35 entries starting with that word and 127 occurrences of this word within a string... should those be unique or grouped?
Unique I guess.
if unique a pivot table will give you a recap of each unique data per year and will sum the corresponding value accordingly. Advantage is that you can select the range of years as necessary (note those 40t rows range from 1947 to 2012).
Ok - I'll give it a try. Seems to me that when I tried that yesterday, the system told me that I had too many data. But I'll try again.
does this make sense to you?
Maybe :) - I'll report back.
 

Watch MrExcel Video

Forum statistics

Threads
1,122,822
Messages
5,598,307
Members
414,224
Latest member
Crazy_FC

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top