Thanks Thanks:  0
Likes Likes:  0
Results 1 to 8 of 8

Thread: Is there a way (formulaic or vba solution) to create a listing of all words in a column and the number

  1. #1
    Board Regular
    Join Date
    Jun 2010
    Posts
    293
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Is there a way (formulaic or vba solution) to create a listing of all words in a column and the number

    of occurrences for each word?

    I have a column of text. There are 500 or so records. What I what is to create a list (on a separate tab) of every unique word in that column and the number of times it occurs. I know this list would be exhaustive, but I'm sure Excel can handle it. I also know that words such as 'the' and 'and' would have the highest number of counts (occurrences), but I can chop them off manually. What I want to end up with is a Top 20 Listing of Unique Words that are not generic.

    Anyone have the solution for this?

    An even more robust approach would be to search all 50 columns in my spreadsheet tracker and return every unique string and a count of the number of occurrences.

    Thanks in advance for thinking about my challenge.


  2. #2
    MrExcel MVP
    Join Date
    May 2009
    Posts
    15,980
    Post Thanks / Like
    Mentioned
    30 Post(s)
    Tagged
    7 Thread(s)

    Default Re: Is there a way (formulaic or vba solution) to create a listing of all words in a column and the number

    By "records" do you mean strings? If so, is each word in a string separated from the next by a space? Is there punctuation like . or , or ; or : .....?

    Can you post a small sample of the contents of a typical cell in the column?
    Joe

    When I was a young man I knew everything. Now that I'm older, I realize I know very little, and what I do know, I tend to forget!

  3. #3
    Board Regular Akashwani's Avatar
    Join Date
    Mar 2009
    Posts
    2,900
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Is there a way (formulaic or vba solution) to create a listing of all words in a column and the number

    Hi,

    If you only have one word per cell, try this......

    How to extract a unique distinct list from a column in excel | Get Digital Help - Microsoft Excel resource

    Sample data and example formulas......

    Sheet3

     ABCDE
    1Data Unique ListCount 
    2Apple Apple3 
    3Orange Orange1 
    4Pear Pear1 
    5Apple Banana2 
    6Banana Grapes2 
    7Banana Plum1 
    8Grapes Melon2 
    9Apple Cherry1 
    10Grapes Lemon1 
    11Plum    
    12Melon    
    13Cherry    
    14Lemon    
    15Melon    
    16     

    Spreadsheet Formulas
    CellFormula
    C2{=INDEX($A$2:$A$15, MATCH(0, COUNTIF($C$1:C1, $A$2:$A$15&""), 0))}
    D2=COUNTIF($A$2:$A$15,C2)
    Formula Array:
    Produce enclosing
    { } by entering
    formula with CTRL+SHIFT+ENTER!


    Excel tables to the web >> Excel Jeanie HTML 4

    The formula in C2 needs entering with ctrl shift enter NOT just enter, you can then copy it down.
    You will obviously need to change the cell references to suit your layout.

    I hope that helps.

    Ak
    A reply is not only helpful to others, but polite to those who have provided a solution!!

    To post sample data go here....
    HtmlMaker
    Excel Genie
    Use Borders

    Upload a Sample File


    Visit here for some EXCELlent videos...

  4. #4
    Board Regular
    Join Date
    Jun 2010
    Posts
    293
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Is there a way (formulaic or vba solution) to create a listing of all words in a column and the number

    Quote Originally Posted by JoeMo View Post
    By "records" do you mean strings? If so, is each word in a string separated from the next by a space? Is there punctuation like . or , or ; or : .....?

    Can you post a small sample of the contents of a typical cell in the column?
    On 22-JAN-2013, PV Systems confirmed to Global Medical Safety (GMS), PV Operations (PV Ops) that the “The Request Investigation Comment” Icon in SCEPTRE used for Automated Referral to Complaint Vigilance (CV), was not functioning as designed. If the case did not generate a change log in Compliant Vigilance’s system due to an issue on the product Quality Management System (PQMS) side of the feed, the automated process will not trigger. Due to this issue, section 7 and section 8 of WI-06466 must be followed for consumer cases.

    They are huge strings of text. Problem statements. No 2 are alike. They are all unique. There are no delimiters as it is conversational annotation (e.g. free form)

  5. #5
    Board Regular
    Join Date
    Jun 2010
    Posts
    293
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Is there a way (formulaic or vba solution) to create a listing of all words in a column and the number

    Not one word. Entire textual paragraphs. Free form. No delimiters. Each cell is totally random and contains up the the Excel limit of cell characters allowed, in some cases.

    On 22-JAN-2013, PV Systems confirmed to Global Medical Safety (GMS), PV Operations (PV Ops) that the “The Request Investigation Comment” Icon in SCEPTRE used for Automated Referral to Complaint Vigilance (CV), was not functioning as designed. If the case did not generate a change log in Compliant Vigilance’s system due to an issue on the product Quality Management System (PQMS) side of the feed, the automated process will not trigger. Due to this issue, section 7 and section 8 of WI-06466 must be followed for consumer cases.

  6. #6
    MrExcel MVP
    Join Date
    May 2009
    Posts
    15,980
    Post Thanks / Like
    Mentioned
    30 Post(s)
    Tagged
    7 Thread(s)

    Default Re: Is there a way (formulaic or vba solution) to create a listing of all words in a column and the number

    Quote Originally Posted by Galena1 View Post
    Not one word. Entire textual paragraphs. Free form. No delimiters. Each cell is totally random and contains up the the Excel limit of cell characters allowed, in some cases.

    On 22-JAN-2013, PV Systems confirmed to Global Medical Safety (GMS), PV Operations (PV Ops) that the “The Request Investigation Comment” Icon in SCEPTRE used for Automated Referral to Complaint Vigilance (CV), was not functioning as designed. If the case did not generate a change log in Compliant Vigilance’s system due to an issue on the product Quality Management System (PQMS) side of the feed, the automated process will not trigger. Due to this issue, section 7 and section 8 of WI-06466 must be followed for consumer cases.
    I'll willing to take a coarse swing at it, treating acronyms as words, things like WI-06466 as words , .... if you think it might be useful.
    Joe

    When I was a young man I knew everything. Now that I'm older, I realize I know very little, and what I do know, I tend to forget!

  7. #7
    Board Regular Akashwani's Avatar
    Join Date
    Mar 2009
    Posts
    2,900
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Is there a way (formulaic or vba solution) to create a listing of all words in a column and the number

    Hi,

    Does this help?....

    Excel udf: Word frequency | Get Digital Help - Microsoft Excel resource

    Example results from the above solution......

    Sheet3

     ABCDE
    1     
    2On 22-JAN-2013, PV Systems confirmed to Global Medical Safety (GMS), PV Operations (PV Ops) that the “The Request Investigation Comment” Icon in SCEPTRE used for Automated Referral to Complaint Vigilance (CV), was not functioning as designed. If the case did not generate a change log in Compliant Vigilance’s system due to an issue on the product Quality Management System (PQMS) side of the feed, the automated process will not trigger. Due to this issue, section 7 and section 8 of WI-06466 must be followed for consumer cases. On1 
    3  22-JAN-2013,1 
    4  PV2 
    5  Systems1 
    6  confirmed1 
    7  to4 
    8  Global1 
    9  Medical1 
    10  Safety1 
    11  (GMS),1 
    12  Operations1 
    13  (PV1 
    14  Ops)1 
    15  that1 
    16  the5 
    17  “The1 
    18  Request1 
    19  Investigation1 
    20  Comment”1 
    21  Icon1 
    22  in2 
    23  SCEPTRE1 
    24  used1 
    25  for2 
    26  Automated1 
    27  Referral1 
    28  Complaint1 
    29  Vigilance1 
    30  (CV),1 
    31  was1 
    32  not3 
    33  functioning1 
    34  as1 
    35  designed.1 
    36  If1 
    37  case1 
    38  did1 
    39  generate1 
    40  a1 
    41  change1 
    42  log1 
    43  Compliant1 
    44  Vigilance’s1 
    45  system1 
    46  due1 
    47  an1 
    48  issue1 
    49  on1 
    50  product1 
    51  Quality1 
    52  Management1 
    53  System1 
    54  (PQMS)1 
    55  side1 
    56  of2 
    57  feed,1 
    58  automated1 
    59  process1 
    60  will1 
    61  trigger.1 
    62  Due1 
    63  this1 
    64  issue,1 
    65  section2 
    66  71 
    67  and1 
    68  81 
    69  WI-064661 
    70  must1 
    71  be1 
    72  followed1 
    73  consumer1 
    74  cases.1 


    Excel tables to the web >> Excel Jeanie HTML 4

    I do not write or understand VBA, so I cannot help you further with this, sorry.
    I guess the VBA code will need to be edited to resolve the issue with punctuation.

    I hope that helps.

    Ak
    A reply is not only helpful to others, but polite to those who have provided a solution!!

    To post sample data go here....
    HtmlMaker
    Excel Genie
    Use Borders

    Upload a Sample File


    Visit here for some EXCELlent videos...

  8. #8
    MrExcel MVP
    Join Date
    May 2009
    Posts
    15,980
    Post Thanks / Like
    Mentioned
    30 Post(s)
    Tagged
    7 Thread(s)

    Default Re: Is there a way (formulaic or vba solution) to create a listing of all words in a column and the number

    Quote Originally Posted by Galena1 View Post
    Not one word. Entire textual paragraphs. Free form. No delimiters. Each cell is totally random and contains up the the Excel limit of cell characters allowed, in some cases.

    On 22-JAN-2013, PV Systems confirmed to Global Medical Safety (GMS), PV Operations (PV Ops) that the “The Request Investigation Comment” Icon in SCEPTRE used for Automated Referral to Complaint Vigilance (CV), was not functioning as designed. If the case did not generate a change log in Compliant Vigilance’s system due to an issue on the product Quality Management System (PQMS) side of the feed, the automated process will not trigger. Due to this issue, section 7 and section 8 of WI-06466 must be followed for consumer cases.
    Here's a module you can try. I have done some cursory testing by filling 12,500 cells in column A with your sample data quoted above (run time approximately 3 minutes on my machine). Limitations are described in a header for the module. Note that the data are assumed to be in Column A starting in A1, and the procedure must be run with the data sheet active.
    Code:
    '---------------------------------------------------------------------------------------
    ' Module    : CountUniqueWordsInRange
    ' Author    : JoeMo
    ' Date      : 3/28/2013
    ' Purpose   : Run from the activesheet. Assumes all data are in column A of the activesheet.
    '             Requires that words are separated by the space character.
    '             Returns all unique words with a count of the number of occurrences of
    '             each word in column A to a new sheet named "Unique Words".
    '             Limit on total word count that can be handled
    '             is 17,179,869,184 (Excel 2007 or later versions). Limit on number of
    '             unique words that can be handled is 1,048,575 (Excel 2007 or later versions).
    '             Treats numbers and acronyms as words.
    '---------------------------------------------------------------------------------------
    
    Sub CountUniqueWordsInRange()
    Dim rS As Range, sSht As Worksheet, dSht As Worksheet, aSht As Worksheet
    Dim Punc As Variant, lRs As Long, lRd As Long, c As Range
    Dim totWords As Long, colCt As Long, vA As Variant, vO() As Variant
    Dim i As Long, j As Long, k As Long, Ct As Long, n As Long
    'define source range
    Set sSht = ActiveSheet
    lRs = sSht.Range("A" & Rows.Count).End(xlUp).Row
    Set rS = sSht.Range("A1", "A" & lRs)
    'Get total word count
    totWords = CountWords(rS)
    'determine how many columns needed to list all words
    colCt = WorksheetFunction.RoundUp(totWords / Rows.Count, 0)
    If colCt > Columns.Count Then
        MsgBox "Too many words to list in one sheet - truncate the input range and try again." & vbNewLine & "Goodbye."
        Exit Sub
    End If
    With Application
    .ScreenUpdating = False
    .Calculation = xlCalculationManual
    .DisplayAlerts = False
    .StatusBar = "PROCESSING YOUR DATA - PLEASE BE PATIENT"
    End With
    'Add sheet to list all words in source range
    On Error Resume Next
    Worksheets("All Words").Delete
    On Error GoTo 0
    ActiveWorkbook.Sheets.Add after:=sSht
    Set aSht = ActiveSheet
    aSht.Name = "All Words"
    'list all words
    Punc = Array(".", ",", ";", ":", "?", "!", "~", "@", "#", "$", _
        "(", ")", "/", Chr(34), Chr(147), Chr(148))
    For Each c In rS
        If Not IsEmpty(c) Then
            vA = Split(Trim(c.Value), " ")
            Ct = Ct + UBound(vA) + 1
            ReDim Preserve vO(1 To Ct)
            For i = LBound(vA) To UBound(vA)
                For j = LBound(Punc) To UBound(Punc)
                    vA(i) = Replace(vA(i), Punc(j), "")
                Next j
            Next i
            For j = LBound(vA) To UBound(vA)
                k = k + 1
                vO(k) = vA(j)
            Next j
        End If
    Next c
    'put all words into All Words sheet
    k = 0
    n = 0
    For i = 1 To colCt
        Do Until n = aSht.Rows.Count Or k = UBound(vO)
            k = k + 1
            n = n + 1
            aSht.Cells(n, i).Value = vO(k)
        Loop
        n = 0
    Next i
    'copy all words to sheet "Unique Words" and remove duplicates
    On Error Resume Next
    Worksheets("Unique Words").Delete
    On Error GoTo 0
    aSht.Copy after:=sSht
    Set dSht = ActiveSheet
    dSht.Name = "Unique Words"
    For i = 1 To colCt
        dSht.Range("A1").CurrentRegion.Columns(i).RemoveDuplicates Columns:=1, Header:=xlNo
    Next i
    'Get word count remaining after dups removal from individual columns
    totWords = CountWords(dSht.Range("A1").CurrentRegion)
    If totWords > dSht.Rows.Count Then
        MsgBox "Too many words remaining for a single column after first pass - Goodbye."
        Exit Sub
    End If
    'Consolidate columns and remove dups again
    With dSht
        lRd = .Range("A" & Rows.Count).End(xlUp).Row + 1
        For i = 2 To colCt
            .Range(Cells(1, i), Cells(Rows.Count, i).End(xlUp)).Cut Destination:=Cells(lRd, 1)
        Next i
    End With
    'Final dups removal from the one remaining column
    dSht.Range("A1").CurrentRegion.RemoveDuplicates Columns:=1, Header:=xlNo
    dSht.Range("A1").EntireRow.Insert
    With dSht.Range("A1:B1")
        .Value = Array("Word", "Count")
        .Font.Bold = True
    End With
    'Get count of each unique word
    lRd = dSht.Range("A" & Rows.Count).End(xlUp).Row
    dSht.Range("B2").FormulaR1C1 = "=COUNTIF('All Words'!C[-1]:C[" & colCt - 2 & "],'Unique Words'!RC[-1])"
    With dSht.Range("B2", "B" & lRd)
        .FillDown
        .Calculate
        .Copy
        .PasteSpecial xlPasteValues
        Application.CutCopyMode = False
    End With
    With dSht.Range("A1:B1")
        .EntireColumn.AutoFit
    End With
    With Application
    .ScreenUpdating = True
    .Calculation = xlCalculationAutomatic
    .DisplayAlerts = True
    .StatusBar = False
    End With
    End Sub
    Function CountWords(R As Range) As Long
    Dim lChars As Long, c As Range, Ct As Long
    For Each c In R
        Ct = 0
        lChars = Len(Trim(c.Value))
        If lChars = 0 Then
            Ct = 0
        Else
            Ct = Len(Trim(c.Value)) - Len(Replace(Trim(c.Value), " ", "")) + 1
        End If
        CountWords = CountWords + Ct
    Next c
    End Function
    Joe

    When I was a young man I knew everything. Now that I'm older, I realize I know very little, and what I do know, I tend to forget!

Some videos you may like

User Tag List

Like this thread? Share it with others

Like this thread? Share it with others

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •