Is this Impossible?

MrGJEB

New Member
Joined
Jun 13, 2008
Messages
37
Hello,

I am currently conducting an audit of our internal invoicing procedures but am working with an incredibly unfriendly data set that is downloaded from oracle. The current description column contains the period (quarter/month etc) for which the invoice relates to but as you can see there is no set pattern or phrases used in this column. I am tryng to find some method of extracting the work period but am at a complete blank. i have tried various combinations of IF ISNUMBER and MID functions but cant get the desired outcome.

Any help will be most appreciated as i am looking at circa 15,000 records.

Thanks

<TABLE style="WIDTH: 544pt; BORDER-COLLAPSE: collapse" border=0 cellSpacing=0 cellPadding=0 width=725 x:str><COLGROUP><COL style="WIDTH: 449pt; mso-width-source: userset; mso-width-alt: 21906" width=599><COL style="WIDTH: 95pt; mso-width-source: userset; mso-width-alt: 4608" width=126><TBODY><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; WIDTH: 449pt; HEIGHT: 12.75pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl22 height=17 width=599>Current Data</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; WIDTH: 95pt; BORDER-TOP: windowtext 0.5pt solid; BORDER-RIGHT: windowtext 0.5pt solid" class=xl22 width=126>Desired Output</TD></TR><TR style="HEIGHT: 25.5pt" height=34><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; WIDTH: 449pt; HEIGHT: 25.5pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl23 height=34 width=599>Short Break Care provided for 9 - 11 April 2010 - 50 hours 23 - 25 April 2010 - 49.5 hours 7 - 9 May 2010 - 50 hours 21 - 23 May 2010 - 48 hours Total = 197.5 hours @ 3.12 per hour</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl24 align=right x:num="40269">Apr-10</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; WIDTH: 449pt; HEIGHT: 12.75pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl23 height=17 width=599>Quarter 1 and 2 funding for above project based on 」100K agreed amount for 2010/11</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl25>Quarter 1 and 2</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; WIDTH: 449pt; HEIGHT: 12.75pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl23 height=17 width=599>Quarter 3 claim invoice</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl25>Quarter 3</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; WIDTH: 449pt; HEIGHT: 12.75pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl23 height=17 width=599>Quarter 4 spend 2009/10 for above project</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl25>Quarter 4</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; WIDTH: 449pt; HEIGHT: 12.75pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl23 height=17 width=599>Qtr 4 funding</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl25>Quarter 4</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; WIDTH: 449pt; HEIGHT: 12.75pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl23 height=17 width=599>Quarter 4 funding 2009/10</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl25>Quarter 4</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; WIDTH: 449pt; HEIGHT: 12.75pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl23 height=17 width=599>Funding for above service based on Quarter 2 claim</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl25>Quarter 2</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; WIDTH: 449pt; HEIGHT: 12.75pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl23 height=17 width=599>Funding for above service based on Quarter 1 claim</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl25 x:str="Quarter 1 ">Quarter 1 </TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; WIDTH: 449pt; HEIGHT: 12.75pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl23 height=17 width=599>Funding for above service based on Quarter 2</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl25>Quarter 2</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent; HEIGHT: 12.75pt; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl25 height=17>Retainer for March 2010 in accordance with Service Level Agreement.</TD><TD style="BORDER-BOTTOM: windowtext 0.5pt solid; BORDER-LEFT: windowtext; BACKGROUND-COLOR: transparent; BORDER-TOP: windowtext; BORDER-RIGHT: windowtext 0.5pt solid" class=xl24 align=right x:num="40238">Mar-10</TD></TR></TBODY></TABLE>
 

Excel Facts

How can you turn a range sideways?
Copy the range. Select a blank cell. Right-click, Paste Special, then choose Transpose.
Hi

The data under the "Current Data" column - is it completed by users at some point?

It seems as if it is free text that users complete, without necessarily following a specified structure?

Regards
Anéll
 
Upvote 0
Ugh! What's this come out of? Capita One? :(

I'll start the ball rolling but I'm not committing myself to any programming effort at the moment. That may change or someone else may step forward.

Don't think about worksheet functions for the moment because I strongly suspect the eventual solution will need to be a bit smarter than that. I would suggest that your first step is to come up with a set of rules in plain English which a programmer can turn into Visual Basic code - a set of rules which will exactly define the data you want extracted without giving you any false 'hits'.

For example:-

1) If the words "quarter" or "qtr" appear anywhere in the line followed by a 1, 2, 3 or 4, extract it. Look to see if the word "and " comes next followed by a 1, 2, 3 or 4, and extract that as well (how many "and"s might there be?), then proceed to look at the next line of input.

2) Otherwise look for the name of a month, either in full or as an abbreviation - and state which abbreviations are allowed - and if it's followed by a two- or four-digit number in a certain range (01-10? 2001-2010?) and not in the future, extract it and proceed to look at the next line of input.

3) Otherwise check if there's a sequence of characters which looks like a date (one or two numeric digits or the name of a month or the three-letter month abbreviation followed by a hyphen, dot or slash separator, then two or four numeric digits, optionally preceded by one or two numeric digits and the same separator, then call it a date and extract it and proceed to look at the next line of input.

I think you already realised how messy it is when you allow humans to key in free-form text!

When you think you have stated the rules which define all the possible ways the data can appear, you're ready to turn those rules into program code, but you have to work out those rules first because if you can't state something in words you're never going to be able to instruct a computer (program) to do it.

Once you've done that and you find yourself in the process of designing a program and producing code, you can progressively refine it until it does exactly what you want.

Comments on what I have written are more than welcome, from you or from anyone else on the board.
 
Last edited:
Upvote 0
Anell / Ruddles

Thanks you for your responses. Yes free form text is the bain of my life.

Thanks for starting us off with the protocols. A collegue and I will do some work on this in the next couple of days and will re-post, hopefully with the bones of some initial code that we can work on.

Thanks again.
 
Upvote 0
When you think you have stated the rules which define all the possible ways the data can appear, you're ready to turn those rules into program code, but you have to work out those rules first because if you can't state something in words you're never going to be able to instruct a computer (program) to do it.

Couldn't have put this better myself! An important lesson that everyone should learn I think!
 
Upvote 0
Here's Macro called Get_Periods that offers a start to what you may want. It uses regexp to pattern match your text. This is my first attempt with regexp, so it most likely can be approved upon. It works for the examples you gave.

If you web search regexp or Regular Expression, you will find several help sites.

To install the macro...
Alt+F11 to open the VBA edittor
From the VBA menu, select Insert\ Module
Paste the code below in the VBA edit window

The macro finds your "Period" matches in A2 to the last used cell in column A and places the results in column B

Code:
Sub Get_Periods()

    Dim RE As Object, REMatches As Object
    Dim cell As Range

    Set RE = CreateObject("VBScript.RegExp")
    
    With RE
        .MultiLine = False
        .Global = False
        .IgnoreCase = False
        
        .Pattern = "(((Quarter|Qtr)(s)? [1-4]( and [1-4])*)" & _
        "|(\b(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|" & _
        "Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\b \b(19|20)([0-9]{2})\b))"
        
    End With
    
    Application.ScreenUpdating = False
    
    For Each cell In Range("A2", Range("A" & Rows.Count).End(xlUp))
    
        Set REMatches = RE.Execute(cell.Text)
        
        If REMatches.Count = 0 Then
            cell.Offset(, 1).Value = vbNullString
        Else
            cell.Offset(, 1).Value = Replace(REMatches(0), "Qtr", "Quarter", , , vbTextCompare)
        End If
    
    Next cell
    
    Application.ScreenUpdating = True
    
    Set RE = Nothing
    
End Sub
Excel Workbook
AB
1Current DataMacro Output
2Short Break Care provided for 9 - 11 April 2011 - 50 hours 23 - 25 April 2011 - 49.5 hours 7 - 9 May 2010 - 50 hours 21 - 23 May 2010 - 48 hours Total = 197.5 hours @ 3.12 per hourApr-11
3Quarter 1 and 2 funding for above project based on ?100K agreed amount for 2010/11Quarter 1 and 2
4Quarter 2 and 3 and 4 claim invoiceQuarter 2 and 3 and 4
5Quarter 4 spend 2009/10 for above projectQuarter 4
6Qtrs 3 and 4 fundingQuarters 3 and 4
7Quarter 4 funding 2009/10Quarter 4
8Funding for above service based on Quarter 2 claimQuarter 2
9Funding for above service based on Quarter 1 claimQuarter 1
10Funding for above service based on Quarter 2Quarter 2
11Retainer for March 2010 in accordance with Service Level Agreement.Mar-10
Example results
 
Upvote 0
Hi

It is great to be able to get some relief with the help of macros. :)

But as a business analyst I'm thinking whether the following should not be done in your environment:

  1. Go to your department's business analyst or IT support person. :cool:

    Provide the users (who currently complete the free text) with drop down lists on the application from which they must select certain standing data statements - for example, Quarter 1 to Quarter 4.

    Then, after they have selected it, they can have a following set of options which they can choose from a drop down list to provide more detail (if it is really needed for auditing, etc.) for example:

    - Funding
    - Claim
    - Services rendered

    etc.

    Please note: (Determine whether this is really necessary - sometimes we want data that we never ever use, and it makes your whole process bloated!)
  2. If the users still insist to use free text after the above has been developed, you can make the use of the above dropdown lists compulsory but still make a field available where the user can type in a few details, but not longer than 20 - 30 characters.
  3. To do system changes can take some time to implement, so in the mean time maybe you can help an analyst or anyone in the position to do so to compile a set of business rules about how the free text must be entered, for example:

    -First enter the quarter, then the other more describing detail, and then whatever the user still wants to say.

    Example: Quarter 4 - Funding - Invoice for June 2009
  4. Everyone must then adhere to the chosen format and if there is a quality control measure, they can check whether the users are following the rules.
  5. You can then manipulate the data easier in Excel by using the "Left" function. Since all the "Quarters" will appear in exactly the same position and with the same format, you can easily obtain only the name of the specific quarter in a column and continue with your analysis.
I know this is not really Excel related, but sometimes we get so used to workarounds and processes that we want to 'panel beat' in Excel that we forget to make our processes and applications more streamline.

Hope you have a great day! :biggrin:

~Anéll
 
Upvote 0
AlphaFrog - thanks for this. Works great. We are just trying to amend it so that it picks up (1-30) and potentially (st,th,rd,nd) after the number but before the month. We will post the amended version.

Anell - all good points, unfortunately when you are dealing with historic systems, hundreds of users and no protocols for data entry this is what you end up with. Obviously i will recomend changes going forward but it is hard to break peoples habbits.

Thanks again
 
Upvote 0
Yes, I understand perfectly! If the human factor comes into play, you can never guarantee the correctness or accuracy of your data.

Fortunately we have experts like Mr Frog to help us - quite some impressive stuff (which I copied for when I might need it!) ;)

Keep well, and good luck with the human factors ... :ROFLMAO:
 
Upvote 0
Ok so i have made some ammendments to AlphaFrogs macro but now i am getting an application/object error on the 'highlighted' line. I have tried changing ".text" to ".value" but no joy.

any thoughts?


Sub Get_Periods2()
Dim RE As Object, REMatches As Object
Dim cell As Range
Set RE = CreateObject("VBScript.RegExp")

With RE
.MultiLine = False
.Global = False
.IgnoreCase = False

.Pattern = "(((Quarter|Qtr)(s)? [1-4]( and [1-4])*)" & _
"|((1(st)?|2(nd)?|3(rd)?|4(th)?|5(th)?|6(th)?|7(th)?|8(th)?|9(th)?|10(th)?|11(th)?|12(th)?|13(th)?|14(th)?|" & _
"15(th?|16(th)?|17(th)?|18(th)?|19(th)?|20(th)?|21(st)?|22(nd)?|(23(rd)?|24(th)?|25(th)?|26(th)?|27(th)?|28(th)?|" & _
"29(th)?|30(th)?|31(st)?)?)\b) (\b(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|" & _
"Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\b \b(19|20)([0-9]{2})\b))"

End With

Application.ScreenUpdating = False

For Each cell In Range("A2", Range("A" & Rows.Count).End(xlUp))

Set REMatches = RE.Execute(cell.Text)

If REMatches.Count = 0 Then
cell.Offset(, 1).Value = vbNullString
Else
cell.Offset(, 1).Value = Replace(REMatches(0), "Qtr", "Quarter", , , vbTextCompare)
End If

Next cell

Application.ScreenUpdating = True

Set RE = Nothing

End Sub
 
Upvote 0

Forum statistics

Threads
1,214,915
Messages
6,122,214
Members
449,074
Latest member
cancansova

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top