Thanks Thanks:  0
Likes Likes:  0
Page 1 of 3 123 LastLast
Results 1 to 10 of 21

Thread: Count words, but not HTML tags

  1. #1
    New Member
    Join Date
    Jun 2016
    Posts
    26
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Count words, but not HTML tags

    Hi
    i want to count all words in a cell, but not those, which are placed between < and >. Or, count all words, which are placed between > and <.
    How can i accomplish this? With regex it is a nobrainer, but in Excel - no clue

  2. #2
    MrExcel MVP
    Moderator

    Tracy's new best bud
    Jon von der Heyden's Avatar
    Join Date
    Apr 2004
    Location
    London, UK
    Posts
    10,656
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    4 Thread(s)

    Default Re: Count words, but not HTML tags

    Have you considered a UDF using regex? If you were to use regex, what would you use?
    Regards,
    Jon von der Heyden

    Posting guidelines | Forum rules | FAQs
    English is a weird language. It can be understood through tough thorough thought, though!

  3. #3
    MrExcel MVP
    Moderator

    Tracy's new best bud
    Jon von der Heyden's Avatar
    Join Date
    Apr 2004
    Location
    London, UK
    Posts
    10,656
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    4 Thread(s)

    Default Re: Count words, but not HTML tags

    I think I have a formula that works. I think the best way is to create a name to hold the text, after being split into individual elements.

    Hopefully this illustrates the approach using name in column C, but I have also illustrated using a single formula approach (no names) in column B.

    Excel 2010

    ABC
    1Contentcount wordscount words using name
    2<tag attribute="">some words</tag> <tag attribute="">some more words</tag><tag attribute="">yet more words<tag>88

    Sheet1



    Array Formulas
    CellFormula
    B2{=SUMPRODUCT(IF(LEN(TRIM(MID(SUBSTITUTE(SUBSTITUTE(A2,"<",REPT(" ",255)&"<"),">",">"&REPT(" ",255)),ROW(INDEX(A:A,1,1):INDEX(A:A,255,1))*255,255))),IF(LEFT(TRIM(MID(SUBSTITUTE(SUBSTITUTE(A2,"<",REPT(" ",255)&"<"),">",">"&REPT(" ",255)),ROW(INDEX(A:A,1,1):INDEX(A:A,255,1))*255,255)),1)<>"<",1+LEN(TRIM(MID(SUBSTITUTE(SUBSTITUTE(A2,"<",REPT(" ",255)&"<"),">",">"&REPT(" ",255)),ROW(INDEX(A:A,1,1):INDEX(A:A,255,1))*255,255)))-LEN(SUBSTITUTE(TRIM(MID(SUBSTITUTE(SUBSTITUTE(A2,"<",REPT(" ",255)&"<"),">",">"&REPT(" ",255)),ROW(INDEX(A:A,1,1):INDEX(A:A,255,1))*255,255))," ","")),0),0))}
    C2{=SUMPRODUCT(IF(LEN(SplitTag),IF(LEFT(SplitTag,1)<>"<",1+LEN(SplitTag)-LEN(SUBSTITUTE(SplitTag," ","")),0),0))}
    Entered with Ctrl+Shift+Enter. If entered correctly, Excel will surround with curly braces {}.
    Note: Do not try and enter the {} manually yourself

    Workbook Defined Names
    NameRefers To
    SplitTag=TRIM(MID(SUBSTITUTE(SUBSTITUTE(Sheet1!A2,"<",REPT(" ",255)&"<"),">",">"&REPT(" ",255)),ROW(INDEX(Sheet1!A:A,1,1):INDEX(Sheet1!A:A,255,1))*255,255))

    Regards,
    Jon von der Heyden

    Posting guidelines | Forum rules | FAQs
    English is a weird language. It can be understood through tough thorough thought, though!

  4. #4
    MrExcel MVP
    Moderator
    Peter_SSs's Avatar
    Join Date
    May 2005
    Location
    Macksville, Australia
    Posts
    39,410
    Post Thanks / Like
    Mentioned
    78 Post(s)
    Tagged
    16 Thread(s)

    Default Re: Count words, but not HTML tags

    Quote Originally Posted by chilly_bang View Post
    i want to count all words in a cell, but not those, which are placed between < and >. Or, count all words, which are placed between > and <.
    Could we see 8-10 rows of representative sample data (with not too many 'words' in each cell) and the expected results?
    Last edited by Peter_SSs; Feb 28th, 2017 at 06:11 AM.
    Hope this helps, good luck.
    Peter
    Excel 365 - Windows 10
    - Want to help your helpers by posting a small, copyable, screen shot directly in your post? Look here
    - If posting vba code, please use Code Tags - like this [code]Paste code here[/code] - or use the VBHTML Maker
    - Read: Forum Rules, Forum Use Guidelines, & FAQ

  5. #5
    MrExcel MVP Rick Rothstein's Avatar
    Join Date
    Apr 2011
    Location
    New Jersey, USA
    Posts
    33,903
    Post Thanks / Like
    Mentioned
    71 Post(s)
    Tagged
    28 Thread(s)

    Default Re: Count words, but not HTML tags

    If you want to try a UDF, I think this may work for you...
    Code:
    Function NonTagWordCount(S As String) As Long Dim X As Long, Phrases() As String Phrases = Split(Replace(S, "<", ">"), ">") For X = 0 To UBound(Phrases) Step 2 NonTagWordCount = NonTagWordCount + UBound(Split(Application.Trim(Phrases(X)))) + 1 Next End Function
    Rick's "mini" blog... http://www.excelfox.com/forum/f22/
    .
    Want to post a small screen shot? See Part B here.

  6. #6
    New Member
    Join Date
    Jun 2016
    Posts
    26
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: Count words, but not HTML tags

    Quote Originally Posted by Peter_SSs View Post
    Could we see 8-10 rows of representative sample data (with not too many 'words' in each cell) and the expected results?
    Sure, here a pair:
    Code:
    a1;
    a2;

    @ Jon von der Heyden: On using your forumla from B2 i get always =, not dependently of content in A2. Is it because i'm on Office 2007?
    @
    Rick Rothstein: how is this code usable?
    Last edited by chilly_bang; Feb 28th, 2017 at 06:51 AM.

  7. #7
    MrExcel MVP Rick Rothstein's Avatar
    Join Date
    Apr 2011
    Location
    New Jersey, USA
    Posts
    33,903
    Post Thanks / Like
    Mentioned
    71 Post(s)
    Tagged
    28 Thread(s)

    Default Re: Count words, but not HTML tags

    @ [/COLOR]Rick Rothstein what do you mean with this code?[/QUOTE]
    What I posted is called a UDF (user defined function). Perhaps this write-up which I have posted in the past will help...

    HOW TO INSTALL UDFs
    ------------------------------------
    If you are new to UDFs, they are easy to install and use. To install it, simply press ALT+F11 to go into the VB editor and, once there, click Insert/Module on its menu bar, then copy/paste the code I posted in Message #5 into the code window that just opened up. That's it.... you are done. You can now use NonTagWordCount just like it was a built-in Excel function. For example,

    =NonTagWordCount(A1)

    If you are using XL2007 or above, make sure you save your file as an "Excel Macro-Enabled Workbook (*.xlsm) and answer the "do you want to enable macros" question as "yes" or "OK" (depending on the button label for your version of Excel) the next time you open your workbook.
    Rick's "mini" blog... http://www.excelfox.com/forum/f22/
    .
    Want to post a small screen shot? See Part B here.

  8. #8
    MrExcel MVP
    Moderator

    Tracy's new best bud
    Jon von der Heyden's Avatar
    Join Date
    Apr 2004
    Location
    London, UK
    Posts
    10,656
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    4 Thread(s)

    Default Re: Count words, but not HTML tags

    Quote Originally Posted by chilly_bang View Post
    Sure, here a pair:
    @ Jon von der Heyden: On using your forumla from B2 i get always =, not dependently of content in A2. Is it because i'm on Office 2007?
    I think that your result includes tags which is being parsed by the forum, so I can't remark on why the formula isn't working. Although you can probably tell from my first post that I would opt for a UDF. Either using Rick's, or if you know regex then you could apply the regex logic to your own UDF.

    I should point out though that my formula is an array formula. Note the instruction in my post to type the formula with an "=" and to commit it with Control+Shift+Enter.
    Last edited by Jon von der Heyden; Feb 28th, 2017 at 07:13 AM.
    Regards,
    Jon von der Heyden

    Posting guidelines | Forum rules | FAQs
    English is a weird language. It can be understood through tough thorough thought, though!

  9. #9
    MrExcel MVP
    Moderator
    Peter_SSs's Avatar
    Join Date
    May 2005
    Location
    Macksville, Australia
    Posts
    39,410
    Post Thanks / Like
    Mentioned
    78 Post(s)
    Tagged
    16 Thread(s)

    Default Re: Count words, but not HTML tags

    I think this is the sample data you tried to post. (When posting HTML text, click Go Advanced at the bottom right of the Quick Reply window and look down for the option to turn off HTML for your post)

    a1;<div class="brandingInfo_wrapper"><div class="footer_bandSingleText"><div id="band_info" class="dark"><h1 class="h2">x-men</h1><div class="textBlock"><p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos e <strong>X-Men</strong> Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum <strong><a href="/lis_Wolverine_bandbrand/">Wolverine</a></strong>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. <strong>X-Men: Apocalypse</strong>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.</p></div></div></div></div>

    a2;<div class="footer_bandSingleText"><div id="band_info" class="dark"><h1 class="h2">star trek</h1><div class="textBlock"><p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. <strong>Star Trek</strong> Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. <strong>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. </strong>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. <strong>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. </strong> Lorem <strong>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. </strong> Ipsum <strong>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. </p></div></div></div>


    So that is two sample texts, what about the last part of my question: "the expected results"?
    .. and explain those expected results please.
    Hope this helps, good luck.
    Peter
    Excel 365 - Windows 10
    - Want to help your helpers by posting a small, copyable, screen shot directly in your post? Look here
    - If posting vba code, please use Code Tags - like this [code]Paste code here[/code] - or use the VBHTML Maker
    - Read: Forum Rules, Forum Use Guidelines, & FAQ

  10. #10
    MrExcel MVP
    Moderator

    Tracy's new best bud
    Jon von der Heyden's Avatar
    Join Date
    Apr 2004
    Location
    London, UK
    Posts
    10,656
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    4 Thread(s)

    Default Re: Count words, but not HTML tags

    Well done Peter pulling that out!

    @chilly_bank: Please do not overlook Peter's request for desired results.

    Mine is more of a cautionary note: I believe there is a flaw in my formula, but I'm not feeling the love for the formula-based solution anyway so not feeling very inclined to debug it. Ricks UDF returns 92 (C1), which by my estimation is the correct result. Herewith a comparison of Rick's UDF and my original formula suggestion in B1.

    Excel 2010

    ABC
    1<div class="brandingInfo_wrapper"><div class="footer_bandSingleText"><div id="band_info" class="dark"><h1 class="h2">x-men</h1><div class="textBlock"><p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos e <strong>X-Men</strong> Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum <strong><a href="/lis_Wolverine_bandbrand/">Wolverine</a></strong>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. <strong>X-Men: Apocalypse</strong>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.</p></div></div></div></div>9492

    Sheet1



    Worksheet Formulas
    CellFormula
    C1=NonTagWordCount(A1)

    Array Formulas
    CellFormula
    B1{=SUMPRODUCT(IF(LEN(SplitTag),IF(LEFT(SplitTag,1)<>"<",1+LEN(SplitTag)-LEN(SUBSTITUTE(SplitTag," ","")),0),0))}
    Entered with Ctrl+Shift+Enter. If entered correctly, Excel will surround with curly braces {}.
    Note: Do not try and enter the {} manually yourself

    Workbook Defined Names
    NameRefers To
    SplitTag=TRIM(MID(SUBSTITUTE(SUBSTITUTE(Sheet1!XFD1,"<",REPT(" ",255)&"<"),">",">"&REPT(" ",255)),ROW(INDEX(Sheet1!XFD:XFD,1,1):INDEX(Sheet1!XFD:XFD,255,1))*255,255))

    Last edited by Jon von der Heyden; Feb 28th, 2017 at 08:30 AM.
    Regards,
    Jon von der Heyden

    Posting guidelines | Forum rules | FAQs
    English is a weird language. It can be understood through tough thorough thought, though!

Some videos you may like

User Tag List

Like this thread? Share it with others

Like this thread? Share it with others

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •