Extract Multiple Sections of Text in a String, Between Two Repeating Words

blafarm

Board Regular
Joined
Oct 14, 2018
Messages
59
Hi All,

I have a bit of a challenge and was hoping for some guidance.

I have a very long string of text in a single cell and I need to extract multiple sections of text that appear between two words that repeat in the string.

For example, here is a very simplified version of the text string in Cell A1:

{"date": 5/7/19 headline:"GE Posts Profit" source:"CNBC"}{"date": 5/8/19 headline:"GE Dividend Shrink" source:"MSN"}{"date": 5/9/19 headline:"GE Bankrupt" source:"WSJ"}

I have had limited success with the following formula, however it only extracts the text between the first appearance of the words: "headline" and "source":

Formula: =MID(A1,SEARCH("headline",A1)+2,SEARCH("source:",A1)-SEARCH("headline",A1)-4)

Result: GE Posts Profit


This formula does not capture ALL of the headlines in the full string -- only the first headline.

My initial goal is to learn a formula that will extract ALL of the headlines within the string, and that will place the results in a succession of horizontal or vertical cells that either look like this:

GE Posts ProfitGE Dividends ShrinkGE Bankrupt

<tbody>
</tbody>

... or that look like this:

GE Posts Profit
GE Dividends Shrink
GE Bankrupt

<tbody>
</tbody>

I have some additional questions that are directly related to this inquiry, but I'm trying to start by keeping it simple.

Thanks very much in advance.

Cheers
 
Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

For your EXCEL version:


Book1
A
14332
2[{"datetime":1557439398000,"headline":"The Little-Known Truth About The Carbon Footprint For The Tesla Model 3 And Other BEVs","source":"Seeking Alpha","url":"https://cloud.iexapis.com/v1/news/article/4e3cfd53-00e3-402a-89a7-fabc5cde087b","summary":"No summary available.","related":"TSLA,TTM","image":"https://cloud.iexapis.com/v1/news/image/4e3cfd53-00e3-402a-89a7-fabc5cde087b","lang":"en","hasPaywall":false},{"datetime":1557431760000,"headline":"Why Sunrun Stock Dropped 6% Today","source":"The Motley Fool","url":"https://cloud.iexapis.com/v1/news/article/8c733a8b-bf9e-4fa2-ab17-a0bf75a59e4e","summary":"Weak guidance (and competition from Tesla?) eclipse good news on Sunrun sales.","related":"TSLA","image":"https://cloud.iexapis.com/v1/news/image/8c733a8b-bf9e-4fa2-ab17-a0bf75a59e4e","lang":"en","hasPaywall":false},{"datetime":1557431040000,"headline":"When GM Sold a Plant to an Electric-Vehicle Startup...Named Tesla","source":"Barron's","url":"https://cloud.iexapis.com/v1/news/article/bb5a1634-9477-400c-aa9c-3f26afb55490","summary":"Workhorse is buying an idled car plant from General Motors. This isnt the first time GM has sold a plant cheap to an upstart maker of electric vehicles.","related":"TSLA","image":"https://cloud.iexapis.com/v1/news/image/bb5a1634-9477-400c-aa9c-3f26afb55490","lang":"en","hasPaywall":true},{"datetime":1557426766000,"headline":"Tesla Battery Swap Enron trading desk all over again","source":"ValueWalk","url":"https://cloud.iexapis.com/v1/news/article/b9767020-bdbc-453c-9d91-719cceb05b79","summary":"Whitney Tilsons email to investors discussing Glenns wise words on Tesla Inc (NASDAQ:TSLA); comments; Motor Mouths article. 1) Glenn with some wise words: Q1 hedge fund letters, conference, scoops etc Frankly, most of the people, many bears included, have become [] The post Tesla Battery Swap Enron trading desk all over again appeared first on ValueWalk .","related":"TSLA","image":"https://cloud.iexapis.com/v1/news/image/b9767020-bdbc-453c-9d91-719cceb05b79","lang":"en","hasPaywall":false},{"datetime":1557416228000,"headline":"Hilary Duff and boyfriend Matthew Koma are engaged: 'He asked me to be his wife'","source":"Business Insider","url":"https://cloud.iexapis.com/v1/news/article/1ac8113a-3028-45f4-bba2-674d1905b62d","summary":"On Thursday, Hilary Duff and Matthew Koma revealed that they were engaged. " He asked me to be his wife," Duff captioned two images on Instagram , one of which showed her sparkling ring. The couple has been linked since 2017 and welcomed a baby girl named Banks Violet Bair in October 2018 . Visit INSIDER's homepage for more stories. Hilary Duff and boyfriend Matthew Koma are engaged. The 31-year-old "Younger" star revealed the milestone with an Instagram post that was shared on Thursday. " He asked me to be his wife," Duff captioned two images. One photo showed her, Koma (whose real name is Matthew Bair), and the engagement ring while the second one showed the couple kissing. He asked me to be his wife?? A post shared by Hilary Duff (@hilaryduff) on May 9, 2019 at 7:20am PDT on May 9, 2019 at 7:20am PDT Koma, a musician who's part of a band called Winnetka Bowling League, shared the same images on his Instagram and wrote: " I asked my best friend to marry me." Read more : Hilary Duff says she struggles to ignore mom-shaming on Instagram: 'Bullying seems to be at an all-time high right now' It's unclear when Duff and Koma started dating, but they have been linked since early 2017.","related":"TSLA","image":"https://cloud.iexapis.com/v1/news/image/1ac8113a-3028-45f4-bba2-674d1905b62d","lang":"en","hasPaywall":false},{"datetime":1557416038000,"headline":"Intel's self-driving leader on why Uber, Lyft and Tesla are in a robo-taxi race for their lives","source":"CNBC","url":"https://cloud.iexapis.com/v1/news/article/20f34de1-d57c-46ef-bd5b-18ec018aaa53","summary":"A leading autonomous-driving executive from Mobileye, a company Intel bought for $15 billion, explains why Lyft, Uber, Tesla and any other company seeking a robo-taxi future is in a race to bring down costs by as much as 50%. And then there's the safety issues of teaching AI to drive without a threat to life.","related":"TSLA","image":"https://cloud.iexapis.com/v1/news/image/20f34de1-d57c-46ef-bd5b-18ec018aaa53","lang":"en","hasPaywall":false}]
Sheet661 (6)
Cell Formulas
RangeFormula
A1=LEN(A2)



Book1
A
3
4
5false
6false
7true
8false
9false
10false
11
Sheet661 (6)
Cell Formulas
RangeFormula
A5=TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(MID(SUBSTITUTE(SUBSTITUTE(A$2,"hasPaywall",REPT(" ",LEN($A$2)),ROWS(A$5:A5)),"datetime",REPT(" ",LEN(A$2)),ROWS(A$5:A5)+1),LEN(A$2),LEN(A$2)),"{",""),"}",""),CHAR(34),""),"]",""),":",""),",",""))


Heading to the Pet crematorium 1st thing in the morning, will not be able to respond again until tomorrow night at the earliest...
 
Last edited:
Upvote 0

Excel Facts

Control Word Wrap
Press Alt+Enter to move to a new row in a cell. Lets you control where the words wrap.
Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

For your EXCEL version:

Heading to the Pet crematorium 1st thing in the morning, will not be able to respond again until tomorrow night at the earliest...

Thank you very much jtakw.
 
Upvote 0
Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

with PowerQuery aka Get&Transform:

datetimeheadlinesourceurlsummaryrelatedimagehasPaywall
1557328805000GE CEO says weaker quarters likely after 'good start' to yearReutershttps://cloud.iexapis.com/v1/news/article/1e7714c0-93f6-4921-af24-f3122a2e809eGeneral Electric Co Chief Executive Larry Culp said on Wednesday the company will likely have weaker quarters in the rest of the year after a surprisingly good start in the first quarter.GEhttps://cloud.iexapis.com/v1/news/image/1e7714c0-93f6-4921-af24-f3122a2e809e lang:enfalse
1557307800000GE Investors Gather to Elect Revamped Board Air GripesThe Wall Street Journalhttps://cloud.iexapis.com/v1/news/article/dfb8a33f-9194-493e-9b83-9709dc0d18a9General Electric Chairman and CEO Larry Culp who joined the company just a year ago is expected to answer questions from investors at the shareholder meeting in Tarrytown N.Y.GEhttps://cloud.iexapis.com/v1/news/image/dfb8a33f-9194-493e-9b83-9709dc0d18a9 lang:entrue
1557205509000IndiGo is making plans for another aggressive pushThe Economic Times Indiahttps://cloud.iexapis.com/v1/news/article/6ffe2450-5e9f-4166-b5f6-a26a79f10908by Anurag KotokyIndiGo is in talks with Airbus SE for another large plane order in a sign Asia’s biggest budget carrier has no intention of letting up on a blistering pace of expansion.The Indian airline which adds an aircraft to its fleet every week is in discussions to buy a longer-range version of the European planemaker’s newest narrow-body jet according to Chief Executive Officer Ronojoy Dutta.Existing orders will see IndiGo through the next two years Dutta said in an interview with Bloomberg News Editor-in-Chief John Micklethwait. After that the carrier is considering adding new planes including Airbus’s long-distance A321neo LR and the yet-to-be-released A321 XLR he said.“When we order we’ll order in bulk for sure ” Dutta said in New Delhi in his first interview since becoming CEO of the airline in January. “I can’t give you a tentative number but it will be large. We need longer range.”Buying BigThough Dutta didn’t specify the size of IndiGo’s next purchase it’s likely to be a multi-billion-dollar order based on the company’s history.GEhttps://cloud.iexapis.com/v1/news/image/6ffe2450-5e9f-4166-b5f6-a26a79f10908 lang:enfalse
1557085967000General Electric: Larry Culp Is The Man (For The Job)Seeking Alphahttps://cloud.iexapis.com/v1/news/article/87bdd7f9-1aec-4943-835f-7a436483cedeGeneral Electric reported strong Q1 2019 results that beat the consensus bottom-line estimate. The stock is up big since General Electric reported Q1 2019 resulGEhttps://cloud.iexapis.com/v1/news/image/87bdd7f9-1aec-4943-835f-7a436483cede lang:enfalse
1556912727000GE to build 500 MW coal power plant in KosovoSeeking Alphahttps://cloud.iexapis.com/v1/news/article/cf2c1bf4-2740-4f80-981b-7a68a8ec327fA General Electric-led ([[GE +2.1%) consortium is selected to build and equip a new 500 MW coal-fired power plant in Kosovo. The new plant designed to mGEhttps://cloud.iexapis.com/v1/news/image/cf2c1bf4-2740-4f80-981b-7a68a8ec327f lang:enfalse
1556901660000Good News! General Electric Burned Less Cash in Q1The Motley Foolhttps://cloud.iexapis.com/v1/news/article/492e3f89-1206-4547-a8a8-db086e773abbIs there light at the end of the tunnel for the troubled industrial conglomerate?GEhttps://cloud.iexapis.com/v1/news/image/492e3f89-1206-4547-a8a8-db086e773abb lang:enfalse
1556893560000The Zacks Analyst Blog Highlights: Apple Microsoft General Electric Ecolab and CelgeneZacks Investment Researchhttps://cloud.iexapis.com/v1/news/article/739fd73b-1e73-4d85-8e8b-7ced62a4e5eaThe Zacks Analyst Blog Highlights: Apple Microsoft General Electric Ecolab and CelgeneGEhttps://cloud.iexapis.com/v1/news/image/739fd73b-1e73-4d85-8e8b-7ced62a4e5ea lang:enfalse
1556889257000GE to build Kosovo's new 500 MW coal power plantReutershttps://cloud.iexapis.com/v1/news/article/f5e2699b-b92e-4aed-aa90-02b557ce5652Kosovo and London-listed power firm ContourGlobal said on Friday they had chosen a consortium of General Electric subsidiaries to build and equip a new 500 megawatt (MW)coal-fired power plant in the Balkan country.GEhttps://cloud.iexapis.com/v1/news/image/f5e2699b-b92e-4aed-aa90-02b557ce5652 lang:enfalse
1556877840000Key Takeaways From General Electric's EarningsThe Motley Foolhttps://cloud.iexapis.com/v1/news/article/356ca854-67f0-4e1b-a782-4f1986da535fThe company delivered a solid but unspectacular quarter but that might be just what investors need right now.GEhttps://cloud.iexapis.com/v1/news/image/356ca854-67f0-4e1b-a782-4f1986da535f lang:enfalse
1556833186000General Electric: Still A MysterySeeking Alphahttps://cloud.iexapis.com/v1/news/article/2a0c3a5c-32fc-4d40-b8e7-458792672cdaGeneral Electric released its first quarter reports with a little bit of positive news and a little bit of negative news. As one analyst put it Did this changeGEhttps://cloud.iexapis.com/v1/news/image/2a0c3a5c-32fc-4d40-b8e7-458792672cda lang:enfalse

<tbody>
</tbody>


Code:
[SIZE=1]// Table1
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    Replace = Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Table.ReplaceValue(Source,"""","",Replacer.ReplaceText,{"Column1"}),"}","",Replacer.ReplaceText,{"Column1"}),"{","",Replacer.ReplaceText,{"Column1"}),"]","",Replacer.ReplaceText,{"Column1"}),
    Split = Table.ExpandListColumn(Table.TransformColumns(Replace, {{"Column1", Splitter.SplitTextByAnyDelimiter({"datetime:","headline:","source:","url:","summary:","related:","image:","hasPaywall:"}, QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
    RTR = Table.Skip(Split,1),
    IntDivide = Table.TransformColumns(Table.AddIndexColumn(RTR, "Index", 0, 1), {{"Index", each Number.IntegerDivide(_, 8), Int64.Type}}),
    Group = Table.Group(IntDivide, {"Index"}, {{"Count", each _, type table}}),
    List = Table.AddColumn(Group, "Custom", each Table.Column([Count],"Column1")),
    Extract = Table.TransformColumns(List, {"Custom", each Text.Combine(List.Transform(_, Text.From), "^"), type text}),
    Split1 = Table.SplitColumn(Extract, "Custom", Splitter.SplitTextByDelimiter("^", QuoteStyle.Csv)),
    RC = Table.RemoveColumns(Split1,{"Index", "Count"}),
    Trim = Table.TransformColumns(RC,{{"Custom.1", Text.Trim, type text}, {"Custom.2", Text.Trim, type text}, {"Custom.3", Text.Trim, type text}, {"Custom.4", Text.Trim, type text}, {"Custom.5", Text.Trim, type text}, {"Custom.6", Text.Trim, type text}, {"Custom.7", Text.Trim, type text}}),
    Ren = Table.RenameColumns(Trim,{{"Custom.1", "datetime"}, {"Custom.2", "headline"}, {"Custom.3", "source"}, {"Custom.4", "url"}, {"Custom.5", "summary"}, {"Custom.6", "related"}, {"Custom.7", "image"}, {"Custom.8", "hasPaywall"}})
in
    Ren[/SIZE]

you can remove unnecessary column(s)

sandy666,

Thanks very much for all your efforts in creating this solution.
It is a very interesting approach that I hope to use in a future sheet.
However, my current sheet is built around a specific architecture that I must perpetuate.
Thanks again for sharing this.

Cheers
 
Upvote 0
Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

You are welcome

Have a nice day
 
Upvote 0
Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

Hey, it should work as I have replaced instances of "headline:" that appear in your string, with the pipe, then searched for the pipe, the reason to use a SEARCH(SUBSTITUTE combination is so you can iterate through instances, hence the use of ROW() and dragging it down - what error do you get when trying it? It works on my screen so curious why it doesn't work for you

tyija1995,

Sorry for the delayed response.
Thank you too for your efforts in devising a solution for me.
It is also an interesting approach that I need to spend more time with in order to find out why I had a problem.
As mentioned above, my current sheet is built around a specific architecture that I must perpetuate.
Thanks again for sharing this.

Cheers
 
Upvote 0
Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

Hi jtakw,

I sent you a PM.
Not sure if your Inbox is full.
If you're just busy -- no problem.

Thank
 
Upvote 0
Re: How to Extract Multiple Sections of Text in a String, Between Two Repeating Words

jtakw,

I had to break my last PM to you into 3 parts (not 2 parts as I thought).

However, I can't send the last part because your Inbox is full.

Please let me know when I can resend it.

Apologies to everyone for this distraction
 
Upvote 0

Forum statistics

Threads
1,213,517
Messages
6,114,089
Members
448,548
Latest member
harryls

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top