Convert Categorization Formula to Power Query

gimmeexcel

Board Regular
Joined
May 8, 2009
Messages
95
Hi everyone,

I have a worksheet that I use to categorize keywords into topics (analytics). It uses a formula that matches the keywords in a main table to 4 other topic tables.

Here's the formula that matches each topic:

Category Type 1: IFERROR(LOOKUP(9.99999999999999E+307,SEARCH(Category18[Find],A4),Category18[Category]),$M$2)
Category Type 2: IFERROR(LOOKUP(9.99999999999999E+307,SEARCH(Category29[Find],A4),Category29[Category]),$P$2)
Category Type 3: IFERROR(LOOKUP(9.99999999999999E+307,SEARCH(Category310[Find],A4),Category310[Category]),$S$2)
Category Type 4: IFERROR(LOOKUP(9.99999999999999E+307,SEARCH(Category411[Find],A5),Category411[Category]),$V$2)

Here's a link to my OneDrive, but unfortunately the formulas aren't copied over

https://devdaygo-my.sharepoint.com/...O3orp4wQc&expiration=2017-03-16T22:11:25.000Z

It works well for a few thousand rows, but I'm running into major problems with a dataset of 120,000 rows. I'm thinking Power Query should be able to calculate more efficiently.

I see that Power Query has a Text.Contains function, so I'm wondering if that could be substituted for the formula that I'm using?

Please take a look at the attached worksheet link and let me know if formula could be converted to a Power Query. Keep in mind, that each topic table can be expanded and may include up to 30 topics for each table.

All help will be gladly appreciated. Thanks!
 

Excel Facts

How to fill five years of quarters?
Type 1Q-2023 in a cell. Grab the fill handle and drag down or right. After 4Q-2023, Excel will jump to 1Q-2024. Dash can be any character.
You can try creating a function that will add a column for each new category like this:
Code:
// fnAddCategoryColumn
(Source as table, Header as text, TextToCompare as text) as table =>


let


    AddCategoryColumn = 
    Table.AddColumn(
        Source, 
        Header, 
        each 
            if Text.Contains( 
                [Keyword], 
                TextToCompare, 
                Comparer.FromCulture( 
                    Culture.Current, 
                    true 
                ) 
            ) 
            then TextToCompare 
            else "", 
        type text 
    )


in


    AddCategoryColumn
And invoke it in a query that has the keyword column. I created a dummy file you can find in Dropbox with your sample data and a query that adds the 4 Topic columns using the fnAddCategoryColumn function.

Code:
// tKeywords
let
    Source = Excel.CurrentWorkbook(){ [Name="tKeywords"] }[Content],
    ChangedType = Table.TransformColumnTypes( Source, {{ "Keyword", type text }} ),
    AddTopic01 = fnAddCategoryColumn( ChangedType, "Topic01", "Symptom" ),
    AddTopic02 = fnAddCategoryColumn( AddTopic01, "Topic02", "Cause" ),
    AddTopic03 = fnAddCategoryColumn( AddTopic02, "Topic03", "Treatment" ),
    AddTopic04 = fnAddCategoryColumn( AddTopic03, "Topic04", "What" )
in
    AddTopic04

I hope this helps.
 
Last edited:
Upvote 0
Hi FranzV,

Thanks so much for your help.

Unfortunately, I'm completely lost on how to use this code. One of the things that's confusing for me is that your result layout does not contain my original layout. So I can't see how everything is matching up.

Where do I create the function that adds the columns, from Excel or Power Query? And do I need to set up the topic tables like in my sample data?

After that, what do I need to do to invoke?

So simple steps would be tremendously helpful. Or if you have a link to an article or video that would help me understand how this work, that would be great too.

Thanks!
 
Upvote 0
Hi FranzV,

Thanks so much for your help.

Unfortunately, I'm completely lost on how to use this code.
I am sorry to have missed the basic explanation on how to implement "M" code in Power Query. I didn't realize you are just starting with this amazing feature. Reza Rad has an intro for Power Query and an explanation on custom functions that might help you get the hang of how Power Query works.

One of the things that's confusing for me is that your result layout does not contain my original layout. So I can't see how everything is matching up.
I updated the Dropbox file to look very similar to your OneDrive file. I included the Metric 1 - 4 columns, though I do not know if or how you use them.

There are 3 sheets in my workbook. The first is an Excel-Only solution with a simplified version of your formulas. I tested it with 10 category columns and ~500k rows and it worked fine (I removed the extra rows to keep the file's size decent). It looks llike this:
[B]Excel 2016 (Windows) 64 bit[/B] [SIZE=2][TABLE="class: head"]
<tbody>[TR="bgcolor: #888888"]
[TH][/TH]
[TH][COLOR=#FFFFFF]A[/COLOR][/TH]
[TH][COLOR=#FFFFFF]B[/COLOR][/TH]
[TH][COLOR=#FFFFFF]C[/COLOR][/TH]
[TH][COLOR=#FFFFFF]D[/COLOR][/TH]
[TH][COLOR=#FFFFFF]E[/COLOR][/TH]
[TH][COLOR=#FFFFFF]F[/COLOR][/TH]
[TH][COLOR=#FFFFFF]G[/COLOR][/TH]
[TH][COLOR=#FFFFFF]H[/COLOR][/TH]
[TH][COLOR=#FFFFFF]I[/COLOR][/TH]
[/TR]
[TR]
[TD="bgcolor: #888888"][CENTER][COLOR=#FFFFFF][B]1[/B][/COLOR][/CENTER]
[/TD]
[TD][B]Keyword Categorizer[/B][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #888888"][CENTER][COLOR=#FFFFFF][B]2[/B][/COLOR][/CENTER]
[/TD]
[TD][I][COLOR=#C00000]Excel only solution[/COLOR][/I][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #888888"][CENTER][COLOR=#FFFFFF][B]3[/B][/COLOR][/CENTER]
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #888888"][CENTER][COLOR=#FFFFFF][B]4[/B][/COLOR][/CENTER]
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD="bgcolor: #D9D9D9"][B][COLOR=#00B050]Symptom[/COLOR][/B][/TD]
[TD="bgcolor: #D9D9D9"][B][COLOR=#00B050]Cause[/COLOR][/B][/TD]
[TD="bgcolor: #D9D9D9"][B][COLOR=#00B050]Treatment[/COLOR][/B][/TD]
[TD="bgcolor: #D9D9D9"][B][COLOR=#00B050]What[/COLOR][/B][/TD]
[/TR]
[TR]
[TD="bgcolor: #888888"][CENTER][COLOR=#FFFFFF][B]5[/B][/COLOR][/CENTER]
[/TD]
[TD="bgcolor: #5B9BD5"][B][COLOR=#FFFFFF]Keyword[/COLOR][/B][/TD]
[TD="bgcolor: #5B9BD5"][B][COLOR=#FFFFFF]Metric 1[/COLOR][/B][/TD]
[TD="bgcolor: #5B9BD5"][B][COLOR=#FFFFFF]Metric 2[/COLOR][/B][/TD]
[TD="bgcolor: #5B9BD5"][B][COLOR=#FFFFFF]Metric 3[/COLOR][/B][/TD]
[TD="bgcolor: #5B9BD5"][B][COLOR=#FFFFFF]Metric 4[/COLOR][/B][/TD]
[TD="bgcolor: #5B9BD5"][B][COLOR=#FFFFFF]Topic 1[/COLOR][/B][/TD]
[TD="bgcolor: #5B9BD5"][B][COLOR=#FFFFFF]Topic 2[/COLOR][/B][/TD]
[TD="bgcolor: #5B9BD5"][B][COLOR=#FFFFFF]Topic 3[/COLOR][/B][/TD]
[TD="bgcolor: #5B9BD5"][B][COLOR=#FFFFFF]Topic 4[/COLOR][/B][/TD]
[/TR]
[TR]
[TD="bgcolor: #888888"][CENTER][COLOR=#FFFFFF][B]6[/B][/COLOR][/CENTER]
[/TD]
[TD]migraine headache symptoms[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Symptom[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #888888"][CENTER][COLOR=#FFFFFF][B]7[/B][/COLOR][/CENTER]
[/TD]
[TD]migrane headache symptoms[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Symptom[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #888888"][CENTER][COLOR=#FFFFFF][B]8[/B][/COLOR][/CENTER]
[/TD]
[TD]what causes severe migraine headaches[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Cause[/TD]
[TD][/TD]
[TD]What[/TD]
[/TR]
[TR]
[TD="bgcolor: #888888"][CENTER][COLOR=#FFFFFF][B]9[/B][/COLOR][/CENTER]
[/TD]
[TD]what is migraine headache symptoms[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Symptom[/TD]
[TD][/TD]
[TD][/TD]
[TD]What[/TD]
[/TR]
[TR]
[TD="bgcolor: #888888"][CENTER][COLOR=#FFFFFF][B]10[/B][/COLOR][/CENTER]
[/TD]
[TD]migraine causes[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Cause[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
[/SIZE][TABLE="class: grid"]
<tbody>[TR]
[TD]Sheet: [B]ExcelOnly[/B][/TD]
[/TR]
</tbody>[/TABLE]


I changed the LOOKUP() for an ISERROR() that will return FALSE whenever the word just above the header (green on grey) is found in the Keyword column. The outer if will return the searchword when it is found an empty string when it isn't. There is no need for the Topic Tables with this formula.
Worksheet Formulas
CellFormula
F6=IF(ISERROR(SEARCH(F$4,[@Keyword])),"",F$4)

<tbody>
</tbody>

<tbody>
</tbody>

I have no clue why you were using the BigNumber LOOKUP.

The second sheet holds a table with the Keyword column and the Metric columns called 'tKeywords' that is used as a source for the PQ query.

The third has a table called 'tPowerQuery' that is loaded into the worksheet after adding the Topic columns.

I'm assuming this is the result you were looking for. If it isn't, I must have understood you incorrectly and would need further details that might help me help you.

Where do I create the function that adds the columns, from Excel or Power Query? And do I need to set up the topic tables like in my sample data?

Now on to the Power Query explanation.

To create a new query using a table as a source, simply select any cell within the table and click 'From Table' in the 'Get & Transform' group of the 'Data' tab in the Menu Ribbon. That will automatically open the Power Query Editor window and display the data in the table without any formatting. From there, you can use the options in the Ribbon to transform your data, add columns, filter rows, etc.

The user interface is very intutive and I would encourage you to play around with your data to grasp the capabilities of PQ. As you apply new transformation steps to your data, you will see a list of them on the right side of the screen. If you select any of the steps you can preview how the infromation looks right after applying that step.

Sometimes you need to repeat certain operations over and over and it becomes tedious and error prone to keep rewriting the same formulas every time. That's when Custom Functions are especially useful because they let you reuse a piece of code in other queries and can take in arguments that make them quite flexible.

The most common way to create a function is by applying the desired steps through the user interface and then go to the Advanced Editor to tweak the code turning it to a function. That is how I created the function for adding the Topic columns. If the number of categories will be changing oftenly, it might need some tweaking to use a table as a source for searchwords and add all the columns at once.

After that, what do I need to do to invoke?

So simple steps would be tremendously helpful.

Once the function is ready, you can return to your original query and click on fx just left of the formula bar. This will add a new step to the query whose default setting is '= PreviousStep' where PreviousStep is the name of the queries previous step. You can edit the formula and call the custom function just as you would call an Excel function, including the parameters inside parenthesis.

You have to do this once for each Topic column, changing the header and the searchword.

PS. You can fin great posts on Power Query of you follow the links in my signature for Imke Feldmann, Ken Puls, or Chris Webb.
 
Last edited:
Upvote 0
The idea of a more flexible query couldn't get out of my head, so I changed the approach using a Parameters Table that can let you specify the column names' prefix (eg. Topic ), the case sensitivity and the default text to display if searchwords are not found.

You can find the updated version in the same Dropbox link and the "M" code is the following.

Code:
// tKeywords
let


//This step loads the keyword table from a table called 'tKeywords'.
    Source = Excel.CurrentWorkbook(){[Name="tKeywords"]}[Content],


//This step is not mandatory, but it is common best practive to use the correct data types for each column.
    ChangedType = Table.TransformColumnTypes(Source, List.Transform( Table.ColumnNames( Source ), each { _, type text } ) )
in
    ChangedType


// tCategories
let
    Source = Excel.CurrentWorkbook(){[Name="tCategories"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Category", type text}, {"SearchWord", type text}})
in
    #"Changed Type"


// tParameters
let


//This step loads the keyword table from a table called 'tKeywords'.
    Source = Excel.CurrentWorkbook(){[Name="tParameters"]}[Content]
in
    Source


// FlexibleQuery
let
    Source = tKeywords,
    Prefix = try tParameters{ [Parameter = "HeaderPrefix"] }[Value] as text otherwise "", 
    CategoryList = tCategories[SearchWord], 
    IgnoreCase = try not tParameters{ [Parameter = "CaseSensitive"] }[Value] as logical otherwise true, 
    Default = try tParameters{ [Parameter = "DefaultValue"] }[Value] as text otherwise " ", 
    Check = Table.AddColumn( 
        Source, 
        Prefix, 
        each Text.Combine( 
            List.Transform( 
                CategoryList,  
                (substring) => 
                    if Text.Contains( [Keyword], substring, Comparer.FromCulture( Culture.Current, IgnoreCase ) ) 
                    then tCategories{[SearchWord=substring]}[DisplayWord]
                    else Default 
            ), 
            "#(tab)" 
        ), 
        type text
    ),


    Split = Table.SplitColumn(
        Check, 
        Prefix, 
        Splitter.SplitTextByDelimiter("#(tab)", QuoteStyle.Csv), 
        List.Transform( 
            List.Numbers( 1, List.Count( CategoryList ), 1 ), 
            each Prefix & Number.ToText( _ ) 
        ) 
    )
in
    Split
 
Upvote 0
Hi FranzV,

Please except my apology for not responding sooner. I have been swamped all day with work.

I will try all of this code out, the first time I get a chance tomorrow.

I really appreciate you for taking the time to help.

I'll follow up as soon as I finish testing.

All the best,
GE
 
Upvote 0
Hi FranzV,

My mind is still spinning from the impressiveness of your solutions. I can't begin to thank you for dedicating so much of your time.

The "Excel-Only solution" is the one that I can most easily understand. I'm still going through the Power Query because I definitely want to learn how it works.

I have a specific question about the "Excel-Only solution". I guess I should backtrack because this wasn't really clear with the sample data that I provided. In my original sample, each one of those tables was capable of adding multiple words. From what I can see with your solution, it's only possible to use one word per topic?

Maybe an example would make it easier to see why I'd want to set the tables up in this manner...

Keyword List
atlanta plastic surgery
atlanta plastic surgeon
atlanta cosmetic surgery
atlanta cosmetic surgeon
los angeles plastic surgery
los angeles plastic surgeon
los angeles cosmetic surgery
los angeles cosmetic surgeon
detroit plastic surgery
detroit plastic surgeon
detroit cosmetic surgery
detroit cosmetic surgeon
orlando plastic surgery
orlando plastic surgeon
orlando cosmetic surgery
orlando cosmetic surgeon
memphis plastic surgery
memphis plastic surgeon
memphis cosmetic surgery
memphis cosmetic surgeon
miami plastic surgery
miami plastic surgeon
miami cosmetic surgery
miami cosmetic surgeon


With my original tables, I'd segment the keywords into 2 Topics:

Topic 1: Geo-location/City
atlanta
los angeles
detroit
orlando
memphis
miami

Topic 2: Modifier
plastic
cosmetic


I'm just wondering if your Excel solution can be modified in this way?

In the meantime, I'll continue to work through your Power Query examples.

Thanks again for your time and help.
 
Upvote 0
Now I understand why you where using the LOOKUP() in your original formula. Sadly I don't know how to improve your formula. I normally use
Code:
{= MATCH( FALSE, ISERROR( SEARCH( SearchTerms, KeyWord ) ), 0 )}
instead of the BigNumber LOOKUP, but I don't know if there is any performance difference between them.

I ran my version on a 120k row sample searching for 100 different words and it took almost a minute to calculate.

If a different idea comes up, I'll let you know.
 
Upvote 0
Thanks FranzV. I really appreciate all of your help.

I'm reading some articles on Power Query. So, I'll see if there's a way to make it work.

Regardless, all of your solutions are tremendously helpful.
 
Upvote 0

Forum statistics

Threads
1,214,819
Messages
6,121,729
Members
449,049
Latest member
MiguekHeka

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top