The list of frequent words used in a text

dscheste

Board Regular
Joined
Jul 10, 2004
Messages
206
Hello, gurus:)

I have read the post here about an alphabetically presented list of frequently used words.

The post riveted my attention and I decided to play around with an idea of creating a code, which will analyse how many times a word was used in a text and then sort the list having the most frequent words on top of the list.

As the text might be large I dismissed the idea of doing it in Excel and had to go to the "bastard child" ™ by Tom Urtis - Microsoft Word.

The code works like this:

1. Copy any text and paste it as unformatted text in to the document (in this case named Frequent_words.doc).
2. Run the macro.
3. Macro deletes the words it has already counted and then tells how many words were processed and how long it took.
4. The stats are in the Stats.doc, which is being saved by default in root.

I would like to ask you to comment on the code, any ideas how to make it run faster, shorter code ideas?

Appreciate your feedback:)

Here is the code: (VBE in Word)

<font face=Courier New><SPAN style="color:#00007F">Sub</SPAN> Frequent_words()
<SPAN style="color:#00007F">Dim</SPAN> MyRange, WordOrder, Word, Frequency, Statistics, stimer, ftimer, Text, cellWordCount, prevcharacter, lastspace, i, nextCharacter
stimer = Timer
<SPAN style="color:#00007F">With</SPAN> Application
.ScreenUpdating = <SPAN style="color:#00007F">False</SPAN> <SPAN style="color:#007F00">' we do not wanna see what is happening</SPAN>
    <SPAN style="color:#00007F">With</SPAN> Selection <SPAN style="color:#007F00">'We have to move to the end of the document and add a space in order not to loose the last word when parsing</SPAN>
    .EndKey unit:=wdStory <SPAN style="color:#007F00">'Goes to the end</SPAN>
    .InsertAfter (" ") <SPAN style="color:#007F00">'inserts a space</SPAN>
    <SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
<SPAN style="color:#00007F">Set</SPAN> MyRange = ActiveDocument.Content
<SPAN style="color:#00007F">With</SPAN> MyRange.Find <SPAN style="color:#007F00">' we have to pre-format the text and get rid of the paragraphs</SPAN>
        .Text = "^p"
        .Replacement.Text = " "
        .Forward = <SPAN style="color:#00007F">True</SPAN>
        .Wrap = wdFindContinue
        .Format = <SPAN style="color:#00007F">False</SPAN>
        .MatchCase = <SPAN style="color:#00007F">False</SPAN>
        .MatchWholeWord = <SPAN style="color:#00007F">False</SPAN>
        .MatchWildcards = <SPAN style="color:#00007F">False</SPAN>
        .MatchSoundsLike = <SPAN style="color:#00007F">False</SPAN>
        .MatchAllWordForms = <SPAN style="color:#00007F">False</SPAN>
    <SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
Selection.Find.Execute Replace:=wdReplaceAll
Text = Trim(MyRange) <SPAN style="color:#007F00">'Now we trim the text a bit to work with nice text:)</SPAN>
cellWordCount = 1
prevcharacter = ""
lastspace = 1

            Documents.Add DocumentType:=wdNewBlankDocument <SPAN style="color:#007F00">' Here we would record our findings</SPAN>
            ChangeFileOpenDirectory "C:\" <SPAN style="color:#007F00">'This path can be changed, when changing the name rename the window as well</SPAN>
            ActiveDocument.SaveAs Filename:="Stats.doc", FileFormat:= _
            wdFormatDocument, LockComments:=False, Password:="", AddToRecentFiles:= _
            <SPAN style="color:#00007F">True</SPAN>, WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:= _
            <SPAN style="color:#00007F">False</SPAN>, SaveNativePicture<SPAN style="color:#00007F">For</SPAN>mat:=False, SaveFormsData:=<SPAN style="color:#00007F">False</SPAN>, _
            SaveAsAOCELetter:=False
<SPAN style="color:#00007F">Set</SPAN> Statistics = Windows("Stats")
<SPAN style="color:#007F00">'Get back to our sheep</SPAN>
Windows("Frequent_Words").Activate
For i = 1 <SPAN style="color:#00007F">To</SPAN> Len(Text)
<SPAN style="color:#007F00">' we will have to loop through the text and analyze every character, a space will mean new word is following</SPAN>
nextCharacter = Mid(Text, i, 1)
    <SPAN style="color:#00007F">If</SPAN> (nextCharacter = " ") And (prevcharacter <> " ") <SPAN style="color:#00007F">Then</SPAN>
        Word = Trim(Mid(Text, lastspace, i - lastspace)) <SPAN style="color:#007F00">' that is our actual word</SPAN>
        lastspace = i
        <SPAN style="color:#00007F">With</SPAN> Active<SPAN style="color:#00007F">Do</SPAN>cument.Content.Find
                .ClearFormatting
            Do <SPAN style="color:#00007F">While</SPAN> .Execute(FindText:=Word, Forward:=True, Format:=True, ReplaceWith:="", MatchWholeWord:=<SPAN style="color:#00007F">True</SPAN>) = True
                Frequency = Frequency + 1
                <SPAN style="color:#007F00">' we have found all the instances of the word and now mark each time the loop revolves</SPAN>
                <SPAN style="color:#00007F">Loop</SPAN>
<SPAN style="color:#00007F">With</SPAN> Statistics.Selection
.InsertAfter (Frequency)
.MoveRight unit:=wdCharacter, Count:=1
.InsertAfter (" ")
.MoveRight unit:=wdCharacter, Count:=1
.InsertAfter (Word)
.MoveDown unit:=wdLine, Count:=1
.InsertParagraph
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
Frequency = 1
        <SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
        cellWordCount = cellWordCount + 1 <SPAN style="color:#007F00">' count the actual number of words</SPAN>
    <SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">If</SPAN>
    prevcharacter = nextCharacter <SPAN style="color:#007F00">' proceed to the next character</SPAN>
<SPAN style="color:#00007F">Next</SPAN> i
<SPAN style="color:#00007F">If</SPAN> Len(Text) = 0 <SPAN style="color:#00007F">Then</SPAN>
    cellWordCount = 0
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">If</SPAN>
<SPAN style="color:#007F00">'Now we sort the results descending</SPAN>
            Statistics.Selection.WholeStory
            Statistics.Selection.Sort ExcludeHeader:=False, FieldNumber:="Field1", _
                    SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderDescending, Separator:= _
                    wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _
                    :=wdEnglishUS
            Statistics.Selection.HomeKey unit:=wdStory
.ScreenUpdating = True
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
ftimer = Timer
<SPAN style="color:#007F00">'A bit of show-off</SPAN>
MsgBox Str(cellWordCount) & " words processed in " & ftimer - stimer & " seconds."
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">Sub</SPAN>


</FONT>


P.S. The code works for every alphanumerical language. Just tried in Russian and it worked:)
 

Excel Facts

Can a formula spear through sheets?
Use =SUM(January:December!E7) to sum E7 on all of the sheets from January through December
Hi!
Im not sure which of this code runs faster. Just try in your machine.
But as I look into your code, you loop thru each character to find a word which will take you too long for long document.

Just try this and feedback.

<font face=Courier New><SPAN style="color:#00007F">Sub</SPAN> CountFreq()
<SPAN style="color:#00007F">Dim</SPAN> NonWord
NonWord = "!:;?.,<>()[]{}"
<SPAN style="color:#00007F">Dim</SPAN> Tmp
<SPAN style="color:#00007F">Dim</SPAN> uniqlist
<SPAN style="color:#00007F">Dim</SPAN> Newdoc <SPAN style="color:#00007F">As</SPAN> Document
st = Timer
uniqlist = "+"
Application.ScreenUpdating = <SPAN style="color:#00007F">False</SPAN>
Selection.Find.Execute Replace:=wdReplaceAll
txt = (Trim(ThisDocument.Content))
<SPAN style="color:#00007F">For</SPAN> i = 1 <SPAN style="color:#00007F">To</SPAN> 31
    txt = Trim(Replace(txt, Chr(i), " ", 1, -1, vbTextCompare)) <SPAN style="color:#007F00">'non printable characters</SPAN>
<SPAN style="color:#00007F">Next</SPAN> i
<SPAN style="color:#00007F">For</SPAN> i = 1 <SPAN style="color:#00007F">To</SPAN> Len(NonWord)
    txt = Trim(Replace(txt, Mid(NonWord, i, 1), "", 1, -1, vbTextCompare))  <SPAN style="color:#007F00">'</SPAN>
<SPAN style="color:#00007F">Next</SPAN> i
<SPAN style="color:#00007F">Do</SPAN> <SPAN style="color:#00007F">Until</SPAN> (Len(txt) - Len(Replace(UCase(txt), "  ", " ", 1, -1, vbTextCompare))) = 0
    txt = Trim(Replace(txt, "  ", " ", 1, -1, vbTextCompare))
<SPAN style="color:#00007F">Loop</SPAN>
Tmp = Split(txt)

<SPAN style="color:#00007F">Set</SPAN> Newdoc = Documents.Add(DocumentType:=wdNewBlankDocument)

<SPAN style="color:#00007F">For</SPAN> <SPAN style="color:#00007F">Each</SPAN> itxt <SPAN style="color:#00007F">In</SPAN> Tmp
    <SPAN style="color:#00007F">If</SPAN> InStr(1, (uniqlist), "+" & UCase(itxt) & "+") > 0 <SPAN style="color:#00007F">Then</SPAN>
        <SPAN style="color:#00007F">GoTo</SPAN> nxttxt
    <SPAN style="color:#00007F">Else</SPAN>
        uniqlist = uniqlist & UCase(itxt) & "+"
        
        cnt = (Len(txt) - Len(Replace(UCase(txt), UCase(itxt), "", 1, -1, vbTextCompare))) / Len(itxt)
    <SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">If</SPAN>

Newdoc.Range(Newdoc.Range.Start, Newdoc.Range.End - 1).InsertAfter Format(cnt, "000") & Chr(9) & Trim(itxt) & Chr(13)

nxttxt:
<SPAN style="color:#00007F">Next</SPAN> itxt

Newdoc.Range.Sort excludeheader:=False, fieldnumber:="Field1", sortorder:=wdSortOrderDescending
Application.ScreenUpdating = <SPAN style="color:#00007F">True</SPAN>
st = Timer - st

wrdcnt = (Len(uniqlist) - 1 - Len(Replace(uniqlist, "+", "")))
MsgBox "There are " & wrdcnt & " Unique words found in " & st & "Seconds"
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">Sub</SPAN></FONT>
 
Upvote 0
Hello Sixth Sence,

sorry for not getting back to you earlier.

I have let run both codes and for documents with 0 > 5000 characters they work with approximately the same speed.

When I launched a test on a John Grisham's novel..... your is much more efficient. I will definitely go with your code.

Thank you for the code, and especially for this idea:

txt = Trim(Replace(txt, Mid(NonWord, i, 1), "", 1, -1, vbTextCompare))

beautiful:)

Have a nice one.

dscheste
 
Upvote 0

Forum statistics

Threads
1,216,052
Messages
6,128,511
Members
449,455
Latest member
jesski

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top