Extract Multiple Sections of Text in a String, Between Two Repeating Words
Page 2 of 4 FirstFirst 1234 LastLast
Results 11 to 20 of 37

Thread: Extract Multiple Sections of Text in a String, Between Two Repeating Words
Thanks Thanks: 0 Likes Likes: 0

  1. #11
    Board Regular
    Join Date
    Jun 2014
    Location
    Oakland, CA
    Posts
    5,146
    Post Thanks / Like
    Mentioned
    42 Post(s)
    Tagged
    2 Thread(s)

    Default Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

    Quote Originally Posted by blafarm View Post
    Thank you very much jtakw!

    You are clearly very talented -- and very generous with your time.

    Any thoughts on the "Text result of SUBSTITUTE is longer than the limit of 50000 characters" error described above?

    Thanks again.
    You're welcome, thanks for the feedback.

    This updated formula should Not error out due to character limits...
    For Excel, as long as your Original text string is no more than, say 10,000 characters, you should be fine.
    I don't use Google sheets, but if the limit is 50,000, then as long as your Original text string is no more than around 15,000 characters, you should be fine.

  2. #12
    Board Regular
    Join Date
    Oct 2018
    Posts
    59
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

    Quote Originally Posted by jtakw View Post
    You're welcome, thanks for the feedback.
    Quote Originally Posted by jtakw View Post

    This updated formula should Not error out due to character limits...
    For Excel, as long as your Original text string is no more than, say 10,000 characters, you should be fine.
    I don't use Google sheets, but if the limit is 50,000, then as long as your Original text string is no more than around 15,000 characters, you should be fine.


    Thank you very much for the prompt reply.

    But, I have to ask you a tough question:

    In the hours since your first post, I have spent the day building-out a very complicated page based on your first excellent solution.

    It turns-out that creating columns of the individual data points is actually working very well for me (Necessity is the Mother of Invention!).

    That's why I indicated that my only remaining problem was the error message.

    And due to that error message, I am wondering if you would be kind enough to provide a version of your updated formula ("that should not error out due to character limits") but that functions the same way as your first formula -- which created a single column of identical data points.

    Is that possible? If so, I would be very grateful.
    Last edited by blafarm; May 8th, 2019 at 06:54 PM.

  3. #13
    Board Regular
    Join Date
    Jun 2014
    Location
    Oakland, CA
    Posts
    5,146
    Post Thanks / Like
    Mentioned
    42 Post(s)
    Tagged
    2 Thread(s)

    Default Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

    Can you show a sample of your new layout, and explain how you want the text strings extracted (so there's no guess work)?

    Thanks.

  4. #14
    Board Regular
    Join Date
    Oct 2018
    Posts
    59
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

    Btw:

    I forgot to mention that your new formula more than doubles my results before hitting the error.
    So, it is very effective and very useful.
    Just hoping a version can be easily created that performs the same as your first generous suggestion.
    Thank you.

  5. #15
    Board Regular
    Join Date
    Oct 2018
    Posts
    59
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

    Quote Originally Posted by jtakw View Post
    Can you show a sample of your new layout, and explain how you want the text strings extracted (so there's no guess work)?

    Thanks.
    Absolutely jtakw. Sorry if my last post was confusing.
    And thanks very much for asking -- and for being receptive to my request.
    Your original formula for creating a vertical column of extracted data was perfect:

    =TRIM(SUBSTITUTE(MID(SUBSTITUTE(SUBSTITUTE(A$1,"headline:",REPT(" ",LEN(A$1))),"source:",REPT(" ",LEN(A$1))),(ROWS(A$2:A2)*2-1)*LEN(A$1),LEN(A$1)),CHAR(34),""))

    And it is that exact formula that represents the basis of the sheet I created today.

    Your original formula actually works better than I could have imagined.
    It allowed me to create a nice block of related data points that are horizontally aligned with each other.

    Here is a screen grab of a very simplified version of the page:




    In the screen grab:

    A2 = Text String
    A5:A11 = Your original formula, isolating the “DateTime”
    B5:B11 = Your original formula, isolating the “Headline”
    C5:C11 = Your original formula, isolating the “URL”
    D5:D11 = Your original formula, isolating the “Source”

    I just pasted your original formula into columns A, B, C, and D -- and customized the "start" and "end" boundaries for the data I wanted to extract.

    Now, your new formula is also great, and very much appreciated. I'm quite sure I will have a use for it in the near future.
    However, for this sheet, having all of the data point categories in one column actually makes it harder to arrive at the layout in the screen grab above.

    But importantly, as you pointed out and as my tests confirm – your new formula structure significantly increases the number of extractions I can achieve before getting the error message.

    So, I'm hoping that there might be a way to modify your original formula, so that it works the exact same way -- but to do so using whatever technique(s) you used in the new formula that allowed it to significantly increase the number of extractions before getting the error.

    Does that make sense?
    If not, I will happily take another stab at explaining it.

    Thanks very much for your continued support.

  6. #16
    Board Regular tyija1995's Avatar
    Join Date
    Feb 2019
    Posts
    581
    Post Thanks / Like
    Mentioned
    12 Post(s)
    Tagged
    4 Thread(s)

    Default Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

    Quote Originally Posted by blafarm View Post
    Thank you for your prompt reply.

    I tried your formula but am getting an error related to the requirement of the pipe argument "|"

    That symbol does not appear in my text strings (please see my post above).

    Thank you.
    Hey, it should work as I have replaced instances of "headline:" that appear in your string, with the pipe, then searched for the pipe, the reason to use a SEARCH(SUBSTITUTE combination is so you can iterate through instances, hence the use of ROW() and dragging it down - what error do you get when trying it? It works on my screen so curious why it doesn't work for you
    √-1 2³ ∑ π
    …And it was delicious!

  7. #17
    Board Regular
    Join Date
    Jun 2014
    Location
    Oakland, CA
    Posts
    5,146
    Post Thanks / Like
    Mentioned
    42 Post(s)
    Tagged
    2 Thread(s)

    Default Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

    So, based on your sample layout in Post # 15, and using your sample data in Post # 5...

    A5 formula copied Across to Column D and Down as far as needed:

    A
    15767
    2[{"datetime":1557328805000 headline:"GE CEO says weaker quarters likely after 'good start' to year" source:"Reuters" url:"https://cloud.iexapis.com/v1/news/article/1e7714c0-93f6-4921-af24-f3122a2e809e" summary:"General Electric Co Chief Executive Larry Culp said on Wednesday the company will likely have weaker quarters in the rest of the year after a surprisingly "good start" in the first quarter." related:"GE" image:"https://cloud.iexapis.com/v1/news/image/1e7714c0-93f6-4921-af24-f3122a2e809e" lang:"en" hasPaywall:false} {"datetime":1557307800000 headline:"GE Investors Gather to Elect Revamped Board Air Gripes" source:"The Wall Street Journal" url:"https://cloud.iexapis.com/v1/news/article/dfb8a33f-9194-493e-9b83-9709dc0d18a9" summary:"General Electric Chairman and CEO Larry Culp who joined the company just a year ago is expected to answer questions from investors at the shareholder meeting in Tarrytown N.Y." related:"GE" image:"https://cloud.iexapis.com/v1/news/image/dfb8a33f-9194-493e-9b83-9709dc0d18a9" lang:"en" hasPaywall:true} {"datetime":1557205509000 headline:"IndiGo is making plans for another aggressive push" source:"The Economic Times India" url:"https://cloud.iexapis.com/v1/news/article/6ffe2450-5e9f-4166-b5f6-a26a79f10908" summary:"by Anurag KotokyIndiGo is in talks with Airbus SE for another large plane order in a sign Asia’s biggest budget carrier has no intention of letting up on a blistering pace of expansion.The Indian airline which adds an aircraft to its fleet every week is in discussions to buy a longer-range version of the European planemaker’s newest narrow-body jet according to Chief Executive Officer Ronojoy Dutta.Existing orders will see IndiGo through the next two years Dutta said in an interview with Bloomberg News Editor-in-Chief John Micklethwait. After that the carrier is considering adding new planes including Airbus’s long-distance A321neo LR and the yet-to-be-released A321 XLR he said.“When we order we’ll order in bulk for sure ” Dutta said in New Delhi in his first interview since becoming CEO of the airline in January. “I can’t give you a tentative number but it will be large. We need longer range.”Buying BigThough Dutta didn’t specify the size of IndiGo’s next purchase it’s likely to be a multi-billion-dollar order based on the company’s history." related:"GE" image:"https://cloud.iexapis.com/v1/news/image/6ffe2450-5e9f-4166-b5f6-a26a79f10908" lang:"en" hasPaywall:false} {"datetime":1557085967000 headline:"General Electric: Larry Culp Is The Man (For The Job)" source:"Seeking Alpha" url:"https://cloud.iexapis.com/v1/news/article/87bdd7f9-1aec-4943-835f-7a436483cede" summary:"General Electric reported strong Q1 2019 results that beat the consensus bottom-line estimate. The stock is up big since General Electric reported Q1 2019 resul" related:"GE" image:"https://cloud.iexapis.com/v1/news/image/87bdd7f9-1aec-4943-835f-7a436483cede" lang:"en" hasPaywall:false} {"datetime":1556912727000 headline:"GE to build 500 MW coal power plant in Kosovo" source:"Seeking Alpha" url:"https://cloud.iexapis.com/v1/news/article/cf2c1bf4-2740-4f80-981b-7a68a8ec327f" summary:"A General Electric-led ([[GE]] +2.1%) consortium is selected to build and equip a new 500 MW coal-fired power plant in Kosovo. The new plant designed to m" related:"GE" image:"https://cloud.iexapis.com/v1/news/image/cf2c1bf4-2740-4f80-981b-7a68a8ec327f" lang:"en" hasPaywall:false} {"datetime":1556901660000 headline:"Good News! General Electric Burned Less Cash in Q1" source:"The Motley Fool" url:"https://cloud.iexapis.com/v1/news/article/492e3f89-1206-4547-a8a8-db086e773abb" summary:"Is there light at the end of the tunnel for the troubled industrial conglomerate?" related:"GE" image:"https://cloud.iexapis.com/v1/news/image/492e3f89-1206-4547-a8a8-db086e773abb" lang:"en" hasPaywall:false} {"datetime":1556893560000 headline:"The Zacks Analyst Blog Highlights: Apple Microsoft General Electric Ecolab and Celgene" source:"Zacks Investment Research" url:"https://cloud.iexapis.com/v1/news/article/739fd73b-1e73-4d85-8e8b-7ced62a4e5ea" summary:"The Zacks Analyst Blog Highlights: Apple Microsoft General Electric Ecolab and Celgene" related:"GE" image:"https://cloud.iexapis.com/v1/news/image/739fd73b-1e73-4d85-8e8b-7ced62a4e5ea" lang:"en" hasPaywall:false} {"datetime":1556889257000 headline:"GE to build Kosovo's new 500 MW coal power plant" source:"Reuters" url:"https://cloud.iexapis.com/v1/news/article/f5e2699b-b92e-4aed-aa90-02b557ce5652" summary:"Kosovo and London-listed power firm ContourGlobal said on Friday they had chosen a consortium of General Electric subsidiaries to build and equip a new 500 megawatt (MW)coal-fired power plant in the Balkan country." related:"GE" image:"https://cloud.iexapis.com/v1/news/image/f5e2699b-b92e-4aed-aa90-02b557ce5652" lang:"en" hasPaywall:false} {"datetime":1556877840000 headline:"Key Takeaways From General Electric's Earnings" source:"The Motley Fool" url:"https://cloud.iexapis.com/v1/news/article/356ca854-67f0-4e1b-a782-4f1986da535f" summary:"The company delivered a solid but unspectacular quarter but that might be just what investors need right now." related:"GE" image:"https://cloud.iexapis.com/v1/news/image/356ca854-67f0-4e1b-a782-4f1986da535f" lang:"en" hasPaywall:false} {"datetime":1556833186000 headline:"General Electric: Still A Mystery" source:"Seeking Alpha" url:"https://cloud.iexapis.com/v1/news/article/2a0c3a5c-32fc-4d40-b8e7-458792672cda" summary:"General Electric released its first quarter reports with a little bit of positive news and a little bit of negative news. As one analyst put it Did this change" related:"GE" image:"https://cloud.iexapis.com/v1/news/image/2a0c3a5c-32fc-4d40-b8e7-458792672cda" lang:"en" hasPaywall:false}]

    Sheet661 (3)



    Worksheet Formulas
    CellFormula
    A1=LEN(A2)



    ABCDE
    3
    4
    51557328805000GE CEO says weaker quarters likely after 'good start' to yearReutershttps//cloud.iexapis.com/v1/news/article/1e7714c0-93f6-4921-af24-f3122a2e809e
    61557307800000GE Investors Gather to Elect Revamped Board Air GripesThe Wall Street Journalhttps//cloud.iexapis.com/v1/news/article/dfb8a33f-9194-493e-9b83-9709dc0d18a9
    71557205509000IndiGo is making plans for another aggressive pushThe Economic Times Indiahttps//cloud.iexapis.com/v1/news/article/6ffe2450-5e9f-4166-b5f6-a26a79f10908
    81557085967000General Electric Larry Culp Is The Man (For The Job)Seeking Alphahttps//cloud.iexapis.com/v1/news/article/87bdd7f9-1aec-4943-835f-7a436483cede
    91556912727000GE to build 500 MW coal power plant in KosovoSeeking Alphahttps//cloud.iexapis.com/v1/news/article/cf2c1bf4-2740-4f80-981b-7a68a8ec327f
    101556901660000Good News! General Electric Burned Less Cash in Q1The Motley Foolhttps//cloud.iexapis.com/v1/news/article/492e3f89-1206-4547-a8a8-db086e773abb
    111556893560000The Zacks Analyst Blog Highlights Apple Microsoft General Electric Ecolab and CelgeneZacks Investment Researchhttps//cloud.iexapis.com/v1/news/article/739fd73b-1e73-4d85-8e8b-7ced62a4e5ea
    121556889257000GE to build Kosovo's new 500 MW coal power plantReutershttps//cloud.iexapis.com/v1/news/article/f5e2699b-b92e-4aed-aa90-02b557ce5652
    131556877840000Key Takeaways From General Electric's EarningsThe Motley Foolhttps//cloud.iexapis.com/v1/news/article/356ca854-67f0-4e1b-a782-4f1986da535f
    141556833186000General Electric Still A MysterySeeking Alphahttps//cloud.iexapis.com/v1/news/article/2a0c3a5c-32fc-4d40-b8e7-458792672cda
    15

    Sheet661 (3)



    Worksheet Formulas
    CellFormula
    A5=IF(COLUMNS($A5:A5)>4,"",SUBSTITUTE(SUBSTITUTE(TRIM(MID(SUBSTITUTE(SUBSTITUTE($A$2,CHOOSE(COLUMNS($A5:A5),"datetime","headline:","source:","url:"),REPT(" ",LEN($A$2)),ROWS(A$5:A5)),CHOOSE(COLUMNS($A5:A5),"headline:","source:","url:","summary:"),REPT(" ",LEN($A$2)),ROWS(A$5:A5)),LEN($A$2),LEN($A$2))),CHAR(34),""),":",""))



    NOTE: Formula is tested on your sample data from Post # 5, which is 5767 characters in length as indicated in A1 above, and does Not error out.
    My formula from Post # 9 was Also tested on your sample from Post # 5 as shown in Post # 9, and does Not error out, if it errored for you, you must have changed/adjusted the formula incorrectly.
    Last edited by jtakw; May 9th, 2019 at 04:40 PM.

  8. #18
    Board Regular
    Join Date
    Oct 2018
    Posts
    59
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

    Quote Originally Posted by jtakw View Post
    So, based on your sample layout in Post # 15, and using your sample data in Post # 5...

    A5 formula copied Across to Column D and Down as far as needed:

    Worksheet Formulas
    Cell Formula
    A5 =IF(COLUMNS($A5:A5)>4,"",SUBSTITUTE(SUBSTITUTE(TRIM(MID(SUBSTITUTE(SUBSTITUTE($A$2,CHOOSE(COLUMNS($A5:A5),"datetime","headline:","source:","url:"),REPT(" ",LEN($A$2)),ROWS(A$5:A5)),CHOOSE(COLUMNS($A5:A5),"headline:","source:","url:","summary:"),REPT(" ",LEN($A$2)),ROWS(A$5:A5)),LEN($A$2),LEN($A$2))),CHAR(34),""),":",""))

    NOTE: Formula is tested on your sample data from Post # 5, which is 5767 characters in length as indicated in A1 above, and does Not error out.
    My formula from Post # 9 was Also tested on your sample from Post # 5 as shown in Post # 9, and does Not error out, if it errored for you, you must have changed/adjusted the formula incorrectly.
    jtakw,

    That works perfectly in Google Sheets.
    Thank you very, very much.

    For some reason, it does not seem to be working in Excel 365 on a desktop.

    Can you please suggest why that might be happening?

    Thank you.

  9. #19
    Board Regular
    Join Date
    Oct 2018
    Posts
    59
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

    EDIT:

    I think I can see the source of the problem.
    In Excel, the WEBSERVICE call seems to be comma delimited, which is different than the Google Sheets IMPORTDATA call.

    [{"datetime":1557439398000,"headline":"The Little-Known Truth About The Carbon Footprint For The Tesla Model 3 And Other BEVs","source":"Seeking Alpha","url":"https://cloud.iexapis.com/v1/news/article/4e3cfd53-00e3-402a-89a7-fabc5cde087b","summary":"No summary available.","related":"TSLA,TTM","image":"https://cloud.iexapis.com/v1/news/image/4e3cfd53-00e3-402a-89a7-fabc5cde087b","lang":"en","hasPaywall":false},{"datetime":1557431760000,"headline":"Why Sunrun Stock Dropped 6% Today","source":"The Motley Fool","url":"https://cloud.iexapis.com/v1/news/article/8c733a8b-bf9e-4fa2-ab17-a0bf75a59e4e","summary":"Weak guidance (and competition from Tesla?) eclipse good news on Sunrun sales.","related":"TSLA","image":"https://cloud.iexapis.com/v1/news/image/8c733a8b-bf9e-4fa2-ab17-a0bf75a59e4e","lang":"en","hasPaywall":false},{"datetime":1557431040000,"headline":"When GM Sold a Plant to an Electric-Vehicle Startup...Named Tesla","source":"Barron's","url":"https://cloud.iexapis.com/v1/news/article/bb5a1634-9477-400c-aa9c-3f26afb55490","summary":"Workhorse is buying an idled car plant from General Motors. This isn’t the first time GM has sold a plant cheap to an upstart maker of electric vehicles.","related":"TSLA","image":"https://cloud.iexapis.com/v1/news/image/bb5a1634-9477-400c-aa9c-3f26afb55490","lang":"en","hasPaywall":true},{"datetime":1557426766000,"headline":"Tesla Battery Swap “Enron trading desk all over again”","source":"ValueWalk","url":"https://cloud.iexapis.com/v1/news/article/b9767020-bdbc-453c-9d91-719cceb05b79","summary":"Whitney Tilson’s email to investors discussing Glenn’s wise words on Tesla Inc (NASDAQ:TSLA); comments; Motor Mouths article. 1) Glenn with some wise words: Q1 hedge fund letters, conference, scoops etc Frankly, most of the people, many bears included, have become […] The post Tesla Battery Swap “Enron trading desk all over again” appeared first on ValueWalk .","related":"TSLA","image":"https://cloud.iexapis.com/v1/news/image/b9767020-bdbc-453c-9d91-719cceb05b79","lang":"en","hasPaywall":false},{"datetime":1557416228000,"headline":"Hilary Duff and boyfriend Matthew Koma are engaged: 'He asked me to be his wife'","source":"Business Insider","url":"https://cloud.iexapis.com/v1/news/article/1ac8113a-3028-45f4-bba2-674d1905b62d","summary":"On Thursday, Hilary Duff and Matthew Koma revealed that they were engaged. " He asked me to be his wife," Duff captioned two images on Instagram , one of which showed her sparkling ring. The couple has been linked since 2017 and welcomed a baby girl named Banks Violet Bair in October 2018 . Visit INSIDER's homepage for more stories. Hilary Duff and boyfriend Matthew Koma are engaged. The 31-year-old "Younger" star revealed the milestone with an Instagram post that was shared on Thursday. " He asked me to be his wife," Duff captioned two images. One photo showed her, Koma (whose real name is Matthew Bair), and the engagement ring while the second one showed the couple kissing. He asked me to be his wife♥️ A post shared by Hilary Duff (@hilaryduff) on May 9, 2019 at 7:20am PDT on May 9, 2019 at 7:20am PDT Koma, a musician who's part of a band called Winnetka Bowling League, shared the same images on his Instagram and wrote: " I asked my best friend to marry me." Read more : Hilary Duff says she struggles to ignore mom-shaming on Instagram: 'Bullying seems to be at an all-time high right now' It's unclear when Duff and Koma started dating, but they have been linked since early 2017.","related":"TSLA","image":"https://cloud.iexapis.com/v1/news/image/1ac8113a-3028-45f4-bba2-674d1905b62d","lang":"en","hasPaywall":false},{"datetime":1557416038000,"headline":"Intel's self-driving leader on why Uber, Lyft and Tesla are in a robo-taxi race for their lives","source":"CNBC","url":"https://cloud.iexapis.com/v1/news/article/20f34de1-d57c-46ef-bd5b-18ec018aaa53","summary":"A leading autonomous-driving executive from Mobileye, a company Intel bought for $15 billion, explains why Lyft, Uber, Tesla and any other company seeking a robo-taxi future is in a race to bring down costs by as much as 50%. And then there's the safety issues of teaching AI to drive without a threat to life.","related":"TSLA","image":"https://cloud.iexapis.com/v1/news/image/20f34de1-d57c-46ef-bd5b-18ec018aaa53","lang":"en","hasPaywall":false}]


    I'm not exactly sure how to modify your excellent formula.

    Any assistance would be greatly appreciated.

    Thank you.
    Last edited by blafarm; May 9th, 2019 at 11:23 PM.

  10. #20
    Board Regular
    Join Date
    Oct 2018
    Posts
    59
    Post Thanks / Like
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Default Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

    Hi jtakw,

    You've been extremely helpful, and I thank you for your time.

    Unfortunately, I'm stuck on this last part of the puzzle.

    I don't know how to modify your formula to work with the Excel string listed above (that has commas).

    I have tried many versions -- but none work.

    I would be very grateful if you could point me in the right direction.

    Thanks

Some videos you may like

User Tag List

Tags for this Thread

Like this thread? Share it with others

Like this thread? Share it with others

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •