Extract 4 digit year from inconsistent string of text

kirksmim

New Member
First posting here so apologies for anything I miss out.

I work in an academic library and I'm trying to profile the collection by year published. The only problem is that the dates for academic journals (which span several years) are inputted manually and therefore are written in an infinite amount of different formats.

Below I have posted an example of the types of dates.

While this is obviously a mess, the reason I retain some hope is that the only 4-digit numbers are years, so if I could somehow extract these it would give me a fighting chance.


Start date Summary Description
N1-VOL.5, 1949-1953. N1-VOL.5, 1949-1953.
1996 VOL.1 NO.1 (1996)-Vol. 8 No. 1 (2003) VOL.1 NO.1 (1996)-
1955-1977. 1955-1977.
2010 VOL.398 (2010)- Vol. 412, No. 8896 (2014 Jul. 19)
2010 VOL.398 (2010)- Vol. 412, No. 8897 (2014 Jul. 26)

In summary: does anyone know a query for 'extract 4-digit numbers from string of text and numbers'? If not has anyone got any idea how else I can solve this?

Many thanks,
Michael
 

kirksmim

New Member
Start date Summary Description
N1-VOL.5, 1949-1953. N1-VOL.5, 1949-1953.
1996 VOL.1 NO.1 (1996)-Vol. 8 No. 1 (2003) VOL.1 NO.1 (1996)-
1955-1977. 1955-1977.
2010 VOL.398 (2010)- Vol. 412, No. 8896 (2014 Jul. 19)
2010 VOL.398 (2010)- Vol. 412, No. 8897 (2014 Jul. 26)
 

kirksmim

New Member
<b>Excel 2012</b><table cellpadding="2.5px" rules="all" style=";background-color: #FFFFFF;border: 1px solid;border-collapse: collapse; border-color: #BBB"><colgroup><col width="25px" style="background-color: #DAE7F5" /><col /><col /><col /><col /></colgroup><thead><tr style=" background-color: #DAE7F5;text-align: center;color: #161120"><th></th><th>O</th><th>P</th><th>Q</th><th>R</th></tr></thead><tbody><tr ><td style="color: #161120;text-align: center;">2</td><td style="text-align: right;border-top: 1px solid black;border-bottom: 1px solid black;;"></td><td style="text-align: right;border-top: 1px solid black;border-bottom: 1px solid black;;">9999</td><td style="border-top: 1px solid black;border-bottom: 1px solid black;;">N1-VOL.5, 1949-1953.</td><td style="border-top: 1px solid black;border-bottom: 1px solid black;;">N1-VOL.5, 1949-1953.</td></tr><tr ><td style="color: #161120;text-align: center;">3</td><td style="text-align: right;border-top: 1px solid black;border-bottom: 1px solid black;;">1996</td><td style="text-align: right;border-top: 1px solid black;border-bottom: 1px solid black;;">9999</td><td style="border-top: 1px solid black;border-bottom: 1px solid black;;">VOL.1 NO.1 (1996)-Vol. 8 No. 1 (2003)</td><td style="border-top: 1px solid black;border-bottom: 1px solid black;;">VOL.1 NO.1 (1996)-</td></tr><tr ><td style="color: #161120;text-align: center;">4</td><td style="text-align: right;border-top: 1px solid black;border-bottom: 1px solid black;;"></td><td style="text-align: right;border-top: 1px solid black;border-bottom: 1px solid black;;">9999</td><td style="border-top: 1px solid black;border-bottom: 1px solid black;;">1955-1977.</td><td style="border-top: 1px solid black;border-bottom: 1px solid black;;">1955-1977.</td></tr><tr ><td style="color: #161120;text-align: center;">5</td><td style="text-align: right;border-top: 1px solid black;border-bottom: 1px solid black;;">2010</td><td style="text-align: right;border-top: 1px solid black;border-bottom: 1px solid black;;">9999</td><td style="border-top: 1px solid black;border-bottom: 1px solid black;;">VOL.398 (2010)-</td><td style="border-top: 1px solid black;border-bottom: 1px solid black;;">Vol. 412, No. 8896 (2014 Jul. 19)</td></tr><tr ><td style="color: #161120;text-align: center;">6</td><td style="text-align: right;border-top: 1px solid black;border-bottom: 1px solid black;;">2010</td><td style="text-align: right;border-top: 1px solid black;border-bottom: 1px solid black;;">9999</td><td style="border-top: 1px solid black;border-bottom: 1px solid black;;">VOL.398 (2010)-</td><td style="border-top: 1px solid black;border-bottom: 1px solid black;;">Vol. 412, No. 8897 (2014 Jul. 26)</td></tr></tbody></table><p style="width:6em;font-weight:bold;margin:0;padding:0.2em 0.6em 0.2em 0.5em;border: 1px solid #BBB;border-top:none;text-align: center;background-color: #DAE7F5;color: #161120">MKS tables</p><br /><br /><table width="85%" cellpadding="2.5px" rules="all" style=";border: 2px solid black;border-collapse:collapse;padding: 0.4em;background-color: #FFFFFF" ><tr><td style="padding:6px" ><b>Worksheet Formulas</b><table cellpadding="2.5px" width="100%" rules="all" style="border: 1px solid;text-align:center;background-color: #FFFFFF;border-collapse: collapse; border-color: #BBB"><thead><tr style=" background-color: #DAE7F5;color: #161120"><th width="10px">Cell</th><th style="text-align:left;padding-left:5px;">Formula</th></tr></thead><tbody><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">O2</th><td style="text-align:left"></td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">P2</th><td style="text-align:left">9999</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">Q2</th><td style="text-align:left">N1-VOL.5, 1949-1953.</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">R2</th><td style="text-align:left">N1-VOL.5, 1949-1953.</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">O3</th><td style="text-align:left">1996</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">P3</th><td style="text-align:left">9999</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">Q3</th><td style="text-align:left">VOL.1 NO.1 (<font color="Blue">1996</font>)-Vol. 8 No. 1 (<font color="Blue">2003</font>)</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">R3</th><td style="text-align:left">VOL.1 NO.1 (<font color="Blue">1996</font>)-</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">O4</th><td style="text-align:left"></td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">P4</th><td style="text-align:left">9999</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">Q4</th><td style="text-align:left">1955-1977.</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">R4</th><td style="text-align:left">1955-1977.</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">O5</th><td style="text-align:left">2010</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">P5</th><td style="text-align:left">9999</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">Q5</th><td style="text-align:left">VOL.398 (<font color="Blue">2010</font>)-</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">R5</th><td style="text-align:left">Vol. 412, No. 8896 (<font color="Blue">2014 Jul. 19</font>)</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">O6</th><td style="text-align:left">2010</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">P6</th><td style="text-align:left">9999</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">Q6</th><td style="text-align:left">VOL.398 (<font color="Blue">2010</font>)-</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">R6</th><td style="text-align:left">Vol. 412, No. 8897 (<font color="Blue">2014 Jul. 26</font>)</td></tr></tbody></table></td></tr></table><br />
 

Rhodie72

Well-known Member
This is a classic "select case" opportunity for vba code. This means getting under the hood of Excel and writing a very clever bit of code that does this for you. The brilliant thing about using "select case" is that you can always add more varieties without expanding a formula to complex and frankly, unmanageable scales.

My recommendation for your request is that you learn about macros and Excel VBA coding first. It makes the whole process if collecting and storing data easier and you can manipulate the data far better with code than formulae.

The best bit is that once you've done the hard work, updates to it are simple and usually improve the code.
 

Special-K99

Well-known Member
I'm quite pleased with this :)

Assumes the following:
Data in column A
Output in column B

Runs down column A, searches each cell, extracts all values where four digits are next to each other

Code:
Sub extractyears()
lastrow = Worksheets("Sheet1").Cells(Rows.Count, "A").End(xlUp).Row
For m = 1 To lastrow
n = Cells(m, 1)
l = ""
For i = 1 To Len(n) - 3
k = 0
For j = 1 To 4
If Mid(n, (i - 1) + j, 1) >= "0" And Mid(n, (i - 1) + j, 1) <= "9" Then
k = k + 1
End If
Next j
If k = 4 Then
l = l & Mid(n, i, 4) & " "
End If
Next i
Cells(m, 2) = l
Next m
End Sub
 
Last edited:

kirksmim

New Member
And so you should be!

That worked perfectly thanks very much.

I feel this is only this in only the first of many hurdles I'll come across with this data but I am very grateful for your help with getting me over this one!
 

Some videos you may like

This Week's Hot Topics

Top