Counting extracted single words from list.

Alex11

New Member
Joined
Oct 17, 2010
Messages
29
I am attempting to count words extracted from unstructured data. To understand my current method please consider my formulas as indicated below:

="^"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,",","^"),".","^"),"?","^"),"!","^"),";","^"),":","^"),"""","^"),"'","^"),"(","^"),")","^"),",","^"),CHAR(13),"^"),CHAR(10),"^"),CHAR(160),"^")," ","^"),"[","^"),"]","^"),"/","^"),"'","^"),"–","^")&"^"

I use the output from above to obtain unique words by using text to data using ^ as a delimiter and sorting them 1 per row in a single column and removing duplicates.

I then use the following formula to obtain a count

=COUNTIF(Master!$C$2:$C$2335,"*^"&A2&"^*")

Please consider this methodology and indicate how I may improve or make it more efficient.

Thank you
 

Excel Facts

Whats the difference between CONCAT and CONCATENATE?
The newer CONCAT function can reference a range of cells. =CONCATENATE(A1,A2,A3,A4,A5) becomes =CONCAT(A1:A5)
You could make this much more efficient depending on your version of Excel. Nesting substitutes is old-fashioned for those using newest versions of Excel.

It would be a bit easier to help if you have some sample data to work with including expected results.
 
Upvote 0
What is unstructured data? Is it a string? Or is it a text file? Or is it something else?
What is a word? Is it a string separated by spaces? Or is it a string separated by '^' ? Or something else?
Do you need to count any one specific word? Or do you need to count for each word in a line or in a file the number of occurrences in this line or text file?
What should be the output? One number for one word? Or two columns, in one of which there will be words, and in the other their number?
 
Upvote 0
@Alex11 what version of Excel are you using?
I suggest that you update your Account details (or click your user name at the top right of the forum) so helpers always know what Excel version(s) & platform(s) you are using as the best solution often varies by version. (Don’t forget to scroll down & ‘Save’)

Can you post some sample data.
MrExcel has a tool called “XL2BB” that lets you post samples of your data that will allow us to copy/paste it to our Excel spreadsheets, so we can work with the same copy of data that you are. Instructions on using this tool can be found here: XL2BB Add-in

Note that there is also a "Test Here” forum on this board. This is a place where you can test using this tool (or any other posting techniques that you want to test) before trying to use those tools in your actual posts.
 
Upvote 0
Latin phrases temp.xlsx
ABCD
1LatinTranslationLatinTranslation
2(oremus) pro invicem(Let us pray), one for the other; let us pray for each other^^oremus^^pro^invicem^^^Let^us^pray^^^one^for^the^other^^let^us^pray^for^each^other^
3["...", ...] dixit["...", ...] said^^^^^^^^^^^^^^dixit^^^^^^^^^^^^^^^said^
4[Dominica] in albis [depositis][Sunday in Setting Aside the] White Garments^^Dominica^^in^albis^^depositis^^^^Sunday^in^Setting^Aside^the^^White^Garments^
5[in] veritate et caritatein truth and charity^^in^^veritate^et^caritate^^in^truth^and^charity^
6a bene placitofrom one well pleased^a^bene^placito^^from^one^well^pleased^
7a caelo usque ad centrumfrom the sky to the center^a^caelo^usque^ad^centrum^^from^the^sky^to^the^center^
8a capite ad calcemfrom head to heel^a^capite^ad^calcem^^from^head^to^heel^
9a contrariofrom the opposite^a^contrario^^from^the^opposite^
10a Deucalionefrom or since Deucalion^a^Deucalione^^from^or^since^Deucalion^
11a falsis principiis proficiscito set forth from false principles^a^falsis^principiis^proficisci^^to^set^forth^from^false^principles^
12a fortiorifrom the stronger^a^fortiori^^from^the^stronger^
13a maiore ad minusfrom the greater to the smaller^a^maiore^ad^minus^^from^the^greater^to^the^smaller^
14a mari usque ad marefrom sea to sea^a^mari^usque^ad^mare^^from^sea^to^sea^
15a minore ad maiusfrom the smaller to the greater^a^minore^ad^maius^^from^the^smaller^to^the^greater^
Master
Cell Formulas
RangeFormula
C2:D15C2="^"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,",","^"),".","^"),"?","^"),"!","^"),";","^"),":","^"),"""","^"),"'","^"),"(","^"),")","^"),",","^"),CHAR(13),"^"),CHAR(10),"^"),CHAR(160),"^")," ","^"),"[","^"),"]","^"),"/","^"),"'","^"),"–","^")&"^"


I have Office 365
 
Last edited by a moderator:
Upvote 0
Latin phrases temp.xlsx
ABCD
1LatinTranslationLatinTranslation
2(oremus) pro invicem(Let us pray), one for the other; let us pray for each other^^oremus^^pro^invicem^^^Let^us^pray^^^one^for^the^other^^let^us^pray^for^each^other^
3["...", ...] dixit["...", ...] said^^^^^^^^^^^^^^dixit^^^^^^^^^^^^^^^said^
4[Dominica] in albis [depositis][Sunday in Setting Aside the] White Garments^^Dominica^^in^albis^^depositis^^^^Sunday^in^Setting^Aside^the^^White^Garments^
5[in] veritate et caritatein truth and charity^^in^^veritate^et^caritate^^in^truth^and^charity^
6a bene placitofrom one well pleased^a^bene^placito^^from^one^well^pleased^
7a caelo usque ad centrumfrom the sky to the center^a^caelo^usque^ad^centrum^^from^the^sky^to^the^center^
8a capite ad calcemfrom head to heel^a^capite^ad^calcem^^from^head^to^heel^
9a contrariofrom the opposite^a^contrario^^from^the^opposite^
10a Deucalionefrom or since Deucalion^a^Deucalione^^from^or^since^Deucalion^
11a falsis principiis proficiscito set forth from false principles^a^falsis^principiis^proficisci^^to^set^forth^from^false^principles^
12a fortiorifrom the stronger^a^fortiori^^from^the^stronger^
13a maiore ad minusfrom the greater to the smaller^a^maiore^ad^minus^^from^the^greater^to^the^smaller^
14a mari usque ad marefrom sea to sea^a^mari^usque^ad^mare^^from^sea^to^sea^
15a minore ad maiusfrom the smaller to the greater^a^minore^ad^maius^^from^the^smaller^to^the^greater^
Master
Cell Formulas
RangeFormula
C2:D15C2="^"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,",","^"),".","^"),"?","^"),"!","^"),";","^"),":","^"),"""","^"),"'","^"),"(","^"),")","^"),",","^"),CHAR(13),"^"),CHAR(10),"^"),CHAR(160),"^")," ","^"),"[","^"),"]","^"),"/","^"),"'","^"),"–","^")&"^"
Latin phrases temp.xlsx
AB
1WordCount
2in204
3non168
4et150
5est117
6ad95
7ex69
8pro50
9per46
10veritas42
11sine39
12quod38
13qui33
14sed33
15ut33
16de32
17omnia31
18semper31
19Dei30
20Deus28
21a27
22pax26
Latin words
Cell Formulas
RangeFormula
B2:B22B2=COUNTIF(Master!$C$2:$C$2335,"*^"&A2&"^*")
 
Upvote 0
What is unstructured data? Is it a string? Or is it a text file? Or is it something else?
What is a word? Is it a string separated by spaces? Or is it a string separated by '^' ? Or something else?
Do you need to count any one specific word? Or do you need to count for each word in a line or in a file the number of occurrences in this line or text file?
What should be the output? One number for one word? Or two columns, in one of which there will be words, and in the other their number?
Unstructured data in this instance is a phrase or sentence within a cell.
A word is either the first character to the first space, the text enclosed by two spaces or the last space to the end of the cell (excluding punctuation).
I want to count every instance of a word in a cell and range.
two columns, in one of which there will be words, and in the other their number
 
Upvote 0
To start with you could use this on the Master sheet
Excel Formula:
="^"&TEXTJOIN("^",,TEXTSPLIT(SUBSTITUTE(SUBSTITUTE(A2,CHAR(13),","),CHAR(160),","),{",",".","?","!",";",":","""","'","(",")"," ","[","]","/","'","-"},CHAR(10)))&"^"
 
Upvote 0
Solution
To start with you could use this on the Master sheet
Excel Formula:
="^"&TEXTJOIN("^",,TEXTSPLIT(SUBSTITUTE(SUBSTITUTE(A2,CHAR(13),","),CHAR(160),","),{",",".","?","!",";",":","""","'","(",")"," ","[","]","/","'","-"},CHAR(10)))&"^"
Thank you

Is there a better way to split my "words" considering that I need to strip punctuation, non printable characters etc.
The other challenge is that I want to identify by means of countif an entire "word" not a portion thereof. That is the purpose of my ^ inclusions.
I am looking for a cleverer way to the differentiate the "app" from the "apple" if that makes sense?
 
Upvote 0
That's what the formula I suggested does, at almost half the length of yours. ;)
 
Upvote 0

Forum statistics

Threads
1,214,942
Messages
6,122,367
Members
449,080
Latest member
Armadillos

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top