Selected key words to be extracted from cell and then copied into another column

SOS123

New Member
Joined
Dec 15, 2013
Messages
4
Hi I am new to this forum. I have already gone through the other threads to make sure that I am not asking questions which have already been answered and I have asked two relatively very experienced Excel users but left with no answer, I now humbly request your assistance.

I have a huge database of 60,000 records in excel. There is a column with descriptions with lots of text. I need to extract specific text from each cell and copy it to another column.


The keywords to extract are as follows (the extraction should ignore capitalisation of letters):

“Commercial Property”
“Residential Property”
“Family”
“Divorce”
“Professional Negligence”
“Medical Negligence”
“Clinical Negligence”
“Negligence”
“Personal Injury”
“Conveyancing”
“Residential Conveyancing”
“Commercial Conveyancing”
“Domestic Conveyancing”
“Crime”
“Criminal Law”
“Fraud”
“Serious Organised Crime”
"Wills"
"Probate"

Etc….

The extract data needs to be placed in the column next to the target column.

Where more than one keyword appears in the target cell, the keywords need to be extracted and placed separated by comma.

AB
Su qualified as a Legal Executive in 1997 and obtained her Fellowship in the same year. She worked in the legal profession in the Runcorn area since 1986 and has extensive knowledge of the local and surrounding areas. She joined Widdows Mason in February 2007.

She deals with all areas of private client work including Wills, Probate, Court of Protection and Lasting Powers of Attorney. Home visits are available to those that our unable to attend at the office. She also deals with all of Residential Property, including sales, purchases, leases, re-mortgages and equity release.

When out of the office Su enjoys spending time with her family and friends. She can also usually be found either at Anfield watching her beloved Liverpool FC for whom she holds a season ticket or watching many other sports on TV.

<tbody>
</tbody>
Wills, Probate, Residential Property

<tbody>
</tbody>
Ursula joined Widdows Mason in 1996 and qualified as a Licensed Conveyancer in April 2009 having previously worked as a Legal Secretary, and then as a Paralegal whilst undertaking a course of study at The Manchester College of Arts & Technology in the evenings. During a period of 6 years she took examinations with the Council of Licensed Conveyancing culminating in her obtaining her first Licence in 2009.<br /><br />The Council for Licensed Conveyancers (CLC) is the regulatory body for Licensed Conveyancers who are qualified specialist property lawyers. All conveyancing - essentially the legal processes involved in transferring buildings and/or land from one owner to another and dealing with the financial transactions - was the sole responsibility of solicitors until 1987.

<tbody>
</tbody>
Conveyancing, Property

<tbody>
</tbody>
Janet initially joined Widdows Mason as a secretary to the Senior Partner at our Westhoughton office more than 20 years ago. Proving herself more than capable, Janet was promoted to the role of Conveyancing Executive, dealing with a caseload of Residential Conveyancing matters. Following further expansion in our Commercial Department, Janet now works from our Leigh Office with John Bullough, a Consultant with the Firm.

<tbody>
</tbody>
Residential Conveyancing

<tbody>
</tbody>
Martyn began working for Chester City Council 1973, followed by periods at other local councils in North Wales, mainly specialising in serious organised crime, criminal law, fraud and homelessness law.<br /><br />After retirement from local government, he took his expertise in housing, welfare benefits and debt counselling to E.J. Bamforth Solicitors in Chester. In particular, he has specialised in defending tenants who have been taken to court for rent arrears or anti-social behaviour and homelessness.<br /><br />Martyn has also represented his clients at reviews regarding introductory and demoted tenancies and at reviews brought under the provisions of the Homeless Persons Act.<br /><br />Martyn is now the supervisor for the housing, welfare benefits and debt legal aid contract at Widdows Mason covering the Cheshire West and Chester procurement area.<br /><br />In his spare time, Martyn enjoys mountain walking and camping. He is also a keen traveller and regularly visits countries abroad, including Bulgaria.

<tbody>
</tbody>
serious organised crime, criminal law, fraud

<tbody>
</tbody>
Vanessa joined Widdows Mason in 1988 and qualified as a legal executive in 1995. She has over 20 years' experience of working in our family and criminal law departments.<br /><br />For the last 10 years Vanessa has specialised in Crown Court work, advising clients and preparing cases for trial before the crown courts at Liverpool, Warrington, Chester, Bolton, Preston and Knutsford.<br /><br />Vanessa has had conduct of the following notable cases at Liverpool Crown Court:<br /><br />R v J: client charged with armed robbery of a building society, found not guilty after trial<br /><br />R v L: client charged with rape of his wife, found not guilty after trial<br /><br />R v R: client charged with assault occasioning actual bodily harm, found not guilty after trial

<tbody>
</tbody>
family, criminal law

<tbody>
</tbody>

<tbody>
</tbody>


<tbody>
</tbody>
Thank you in anticipation.
 

Excel Facts

Excel Joke
Why can't spreadsheets drive cars? They crash too often!
This looks like you are doing an HR function for Keywords in resume's there are several tool or programs on line for a fee of course that HR departments use for this function, it scans the entire document and counts the number of "Key" words that it finds based on the users input. to do this in Excel would be a little difficult IMHO based on the cell/s where your data is imported and the cell/s where your extracted data would go. unless I misunderstood your question I don't think Excel is your best option.
 
Upvote 0
Hi SOS123. Try this macro. If you have additional search criteria, you'll have to add them to the array in the code.
Code:
Option Compare Text
Sub Test()
    Application.ScreenUpdating = False
    Dim bottomA As Integer
    bottomA = Range("A" & Rows.Count).End(xlUp).Row
    Dim rng As Range
    Dim cell As Range
    Dim i As Long
    Set rng = Range("A1:A" & bottomA)
    Dim searchVal
    Range("B1") = ","
    searchVal = Array("Commercial Property", "Residential Property", "Family ", "Divorce ", "Professional Negligence", "Medical Negligence", _
        "Clinical Negligence", "Negligence", "Personal Injury", "Conveyancing ", "Residential Conveyancing", "Commercial Conveyancing", _
        "Domestic Conveyancing", "Crime", "Criminal Law", "Fraud", "Serious Organised Crime", "Wills", "Probate")
    For Each cell In rng
        For i = 0 To UBound(searchVal)
            If cell Like "*" & searchVal(i) & "*" Then
                cell.Offset(0, 1) = cell.Offset(0, 1) & ", " & searchVal(i)
                cell.Offset(0, 1) = Mid(cell.Offset(0, 1), 2, 999)
            End If
        Next i
    Next cell
    Application.ScreenUpdating = False
End Sub
 
Last edited:
Upvote 0
Wow Mumps you are BRILLIANT! Thank you soooo much. I amended the code to the below. There is only one issue - the first few letters of the results seem to be missing for each cell. Any idea why? And also is there a way of removing all duplicate text in the results? Thank you again.

...........................
Option Compare Text
Sub Test()
Application.ScreenUpdating = False
Dim bottomA As Integer
bottomA = Range("M" & Rows.Count).End(xlUp).Row
Dim rng As Range
Dim cell As Range
Dim i As Long
Set rng = Range("M2:M" & bottomA)
Dim searchVal
Range("N2") = ","
searchVal = Array("Commercial Property", "Residential Property", "Family ", "Divorce ", "Professional Negligence", "Medical Negligence", _
"Clinical Negligence", "Negligence", "Personal Injury", "Conveyancing ", "Residential Conveyancing", "Commercial Conveyancing", _
"Immigration", "Asylum", "Civil justice", "Civil Liberties", "Company Commercial", "Competition", "Dispute Resolution", "Litigation", _
"Arbitration", "Mediation", "Employment", "Children", "Divorce", "Ancilliary Relief", "Legal aid", "Money Laundering", "Financial Crime", _
"Private Client", "Regulation", "Corporate Tax", "Tax", "Regulatory Offence", "Insurance", "Insurance Fraud", "Construction", _
"Charity", "banking", "Consumer", "Corporate Finance", "Environmental", "Housing", "Human Rights", "Landlord and Tenant", _
"Landlord & Tenant", "Intellectual Property", "Copyright", "Patent", "Trademark", "Restructuring and Insolvency", "Insolvency", _
"Bankruptcy", "Liquidation", "Individual Voluntary Arrangment", "Company Voluntary Arrangement", "Debt", "Debt Restructuring", "Floatation", _
"Social Welfare", "International Law", "Shipping", "Marine", "Maritime", "Aviation", "Travel", "Holiday", "Grant of Lease", "Leasehold", _
"Share Sale", "Business Acquisition", "Mergers and Acquisition", "Road Traffic Accident", "RTA", "Slip and Fall", "car accident", _
"Matrimonial", "Co-habitation", "Civil Partnerships", "Defamation", "Libel", "Slander", "Pension", "Energy", "Elderly", "Reinsurance", _
"Media", "Outsourcing", "Technology", "Sports", "Education Law", "Civil Action Against the Police", "Regulatory Law", "Prison Law", _
"Neigbour Dispute", "Compromise Agreement", "Work Permit", "Visa", "Accident at Work", _
"Boundary Dispute", "Partnership", "Domestic Conveyancing", "Crime", "Criminal Law", "Fraud", "Serious Organised Crime", "Wills", "Probate")
For Each cell In rng
For i = 0 To UBound(searchVal)
If cell Like "*" & searchVal(i) & "*" Then
cell.Offset(0, 1) = cell.Offset(0, 1) & ", " & searchVal(i)
cell.Offset(0, 1) = Mid(cell.Offset(0, 1), 2, 999)
End If
Next i
Next cell
Application.ScreenUpdating = False
End Sub
 
Upvote 0
Hi SOS123. Because there are commas separating the phrases in column B, a comma is placed at the very beginning of each entry in column B. In order to remove that single comma, I have one line of code that did the trick in each cell in column B. That line may be causing the problem of the missing letters but it should only be removing one character (the comma). I'm not sure why it is removing more than one character. Will any cell in column B ever have more than 999 characters after the macro is run? If you are OK with a lonely comma at the beginning of each cell in column B, the problem could probably be fixed by removing the line:
Code:
cell.Offset(0, 1) = Mid(cell.Offset(0, 1), 2, 999)
When you refer to duplicates, do you mean any duplicates in each cell in column B? It would help if I could see the actual file you are using. Could you upload your file to a free site such as www.box .com ? You can get a link to the file and post it here. If your file has confidential data, you could replace it with generic data. I am by no means an Excel expert, but I'll do my best to find a solution.
 
Upvote 0
Hi Mumps,

Thank you for reverting to me. Column B will never have more than 999 characters. I am ok with a lonely comma though. With regard to duplicates. If in Column A, there are several mentions of a word, Column B result should only display that word once in the result. Also with the 60,000 records, I am not sure if it is too big as there seem to be an error that keeps appearing. When I press DEBUG, I get the VBA editor with the following text highlighted: bottomA = Range("I" & Rows.Count).End(xlUp).Row

I have uploaded the file as per your suggestion to Box. The file is pretty big - 20 Mb. The link is https://app.box.com/s/umwpjajpq2z8ii3191z7

The column that the data needs to be extracted is Column I and the column for the results is column J. The file has two excel worksheets. The Sheet titled Last 10K... is a section of the first worksheet. You will see what I meant by the first few letters being cut out. Mumps, I hugely appreciate the help you have given me.

Many thanks SOS123
 
Upvote 0
Hi SOS123. Try the following macro. Please be patient. There is a lot of data that needs to be processed.
Code:
Option Compare Text
Sub Test()
    Application.ScreenUpdating = False
    Dim bottomI As Integer
    bottomI = Range("I" & Rows.Count).End(xlUp).Row
    Dim Data As Range
    Set Data = Range("I1:I" & bottomI)
    Dim cell As Range
    Dim i As Long
    Dim searchVal
    Dim dic As Object, cell2 As Range, temp As Variant
    Dim ii As Long
    Range("B1") = ","
    searchVal = Array("Commercial Property", "Residential Property", "Family ", "Divorce ", "Professional Negligence", "Medical Negligence", _
        "Clinical Negligence", "Negligence", "Personal Injury", "Conveyancing ", "Residential Conveyancing", "Commercial Conveyancing", _
        "Immigration", "Asylum", "Civil justice", "Civil Liberties", "Company Commercial", "Competition", "Dispute Resolution", "Litigation", _
        "Arbitration", "Mediation", "Employment", "Children", "Divorce", "Ancilliary Relief", "Legal aid", "Money Laundering", "Financial Crime", _
        "Private Client", "Regulation", "Corporate Tax", "Tax", "Regulatory Offence", "Insurance", "Insurance Fraud", "Construction", _
        "Charity", "banking", "Consumer", "Corporate Finance", "Environmental", "Housing", "Human Rights", "Landlord and Tenant", _
        "Landlord & Tenant", "Intellectual Property", "Copyright", "Patent", "Trademark", "Restructuring and Insolvency", "Insolvency", _
        "Bankruptcy", "Liquidation", "Individual Voluntary Arrangment", "Company Voluntary Arrangement", "Debt", "Debt Restructuring", "Floatation", _
        "Social Welfare", "International Law", "Shipping", "Marine", "Maritime", "Aviation", "Travel", "Holiday", "Grant of Lease", "Leasehold", _
        "Share Sale", "Business Acquisition", "Mergers and Acquisition", "Road Traffic Accident", "RTA", "Slip and Fall", "car accident", _
        "Matrimonial", "Co-habitation", "Civil Partnerships", "Defamation", "Libel", "Slander", "Pension", "Energy", "Elderly", "Reinsurance", _
        "Media", "Outsourcing", "Technology", "Sports", "Education Law", "Civil Action Against the Police", "Regulatory Law", "Prison Law", _
        "Neigbour Dispute", "Compromise Agreement", "Work Permit", "Visa", "Accident at Work", _
        "Boundary Dispute", "Partnership", "Domestic Conveyancing", "Crime", "Criminal Law", "Fraud", "Serious Organised Crime", "Wills", "Probate")
    For Each cell In Data
        For i = 0 To UBound(searchVal)
            If cell Like "*" & searchVal(i) & "*" Then
                cell.Offset(0, 1) = cell.Offset(0, 1) & ", " & searchVal(i)
                'cell.Offset(0, 1) = Mid(cell.Offset(0, 1), 2, 999)
            End If
        Next i
    Next cell
    Set dic = CreateObject("scripting.dictionary")
    With dic
        For Each cell2 In Range("J1:J" & Cells(Rows.Count, "J").End(xlUp).Row)
            .RemoveAll
            If Len(cell2.Value) > 0 Then
                temp = Split(Replace(cell2.Value, " ", ""), ",")
                For ii = 0 To UBound(temp)
                    If Not .Exists(temp(ii)) Then .Add temp(ii), temp(ii)
                Next ii
                cell2.Value = Join(.Keys, ",")
            End If
        Next cell2
    End With
    Application.ScreenUpdating = True
End Sub
 
Upvote 0
Hi Mumps I am now getting Runtime Error 6 when I run it for the 60,000 records. However, when I split the records into smaller batches I am not getting this error. Also the text now has no space between the words in the results Example: "Professional Negligence" appears as "ProfessionalNegligence" and so on. I think if this can be resolved; I can split the file into smaller files and run it to get around the Runtime Error 6.

SOS123
 
Upvote 0
Hi SOS123. With a little help from the web, I modified the code and hopefully, it now works. It even removes the initial comma. Please give it a try and let me know how it works out.
Code:
Option Compare Text
Sub test()
    Application.ScreenUpdating = False
    Dim bottomI As Integer
    bottomI = Range("I" & Rows.Count).End(xlUp).Row
    Dim Data As Range
    Set Data = Range("I1:I" & bottomI)
    Dim cell As Range
    Dim cell2 As Range
    Dim i As Long
    Dim searchVal
    Dim starval As String
    Dim finval As String
    Dim strarray() As String
    Dim x As Long
    Dim k As Long
    Range("B1") = ","
    searchVal = Array("Commercial Property", "Residential Property", "Family ", "Divorce ", "Professional Negligence", "Medical Negligence", _
        "Clinical Negligence", "Negligence", "Personal Injury", "Conveyancing ", "Residential Conveyancing", "Commercial Conveyancing", _
        "Immigration", "Asylum", "Civil justice", "Civil Liberties", "Company Commercial", "Competition", "Dispute Resolution", "Litigation", _
        "Arbitration", "Mediation", "Employment", "Children", "Divorce", "Ancilliary Relief", "Legal aid", "Money Laundering", "Financial Crime", _
        "Private Client", "Regulation", "Corporate Tax", "Tax", "Regulatory Offence", "Insurance", "Insurance Fraud", "Construction", _
        "Charity", "banking", "Consumer", "Corporate Finance", "Environmental", "Housing", "Human Rights", "Landlord and Tenant", _
        "Landlord & Tenant", "Intellectual Property", "Copyright", "Patent", "Trademark", "Restructuring and Insolvency", "Insolvency", _
        "Bankruptcy", "Liquidation", "Individual Voluntary Arrangment", "Company Voluntary Arrangement", "Debt", "Debt Restructuring", "Floatation", _
        "Social Welfare", "International Law", "Shipping", "Marine", "Maritime", "Aviation", "Travel", "Holiday", "Grant of Lease", "Leasehold", _
        "Share Sale", "Business Acquisition", "Mergers and Acquisition", "Road Traffic Accident", "RTA", "Slip and Fall", "car accident", _
        "Matrimonial", "Co-habitation", "Civil Partnerships", "Defamation", "Libel", "Slander", "Pension", "Energy", "Elderly", "Reinsurance", _
        "Media", "Outsourcing", "Technology", "Sports", "Education Law", "Civil Action Against the Police", "Regulatory Law", "Prison Law", _
        "Neigbour Dispute", "Compromise Agreement", "Work Permit", "Visa", "Accident at Work", _
        "Boundary Dispute", "Partnership", "Domestic Conveyancing", "Crime", "Criminal Law", "Fraud", "Serious Organised Crime", "Wills", "Probate")
    For Each cell In Data
        If cell <> "NULL" Then
            For i = 0 To UBound(searchVal)
                If cell Like "*" & searchVal(i) & "*" Then
                    cell.Offset(0, 1) = cell.Offset(0, 1) & ", " & searchVal(i)
                End If
            Next i
        End If
    Next cell
    
    For Each cell2 In Range("J1:J" & Cells(Rows.Count, "J").End(xlUp).Row)
        If cell2 <> "" Then
            Erase strarray
            finval = ""
            starval = cell2.Value
            strarray = Split(starval, ",")
            For rw = 0 To UBound(strarray)
                For k = rw + 1 To UBound(strarray)
                    If Trim(strarray(k)) = Trim(strarray(rw)) Then
                        strarray(k) = ""
                    End If
                Next k
            Next rw
            For x = 0 To UBound(strarray)
                If strarray(x) <> "" Then
                    finval = finval & Trim(strarray(x)) & ", "
                End If
            Next x
            finval = Trim(finval)
            If finval <> "" Then
                finval = Left(finval, Len(finval) - 1)
                cell2.Value = finval
            End If
        End If
    Next cell2
    Application.ScreenUpdating = True
End Sub
 
Upvote 0

Forum statistics

Threads
1,215,084
Messages
6,123,024
Members
449,092
Latest member
ikke

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top