TEXTJOIN in Power Query
October 03, 2017 - by Bill Jelen
CONCATENATEX in Power Query. The new TEXTJOIN function is awesome. Can you do the same thing with Power Query? Yes. Now you can.
- A viewer downloads data from a system where each item is separated by Alt+Enter
- Bill: Why are you doing this? Viewer: It is how I inherit the data. I want to keep it that way.
- Bill: What do you want to do with the 40% of values not in the table? Viewer: No answer
- Bill: There is a complicated way to solve this if you have the latest Power Query tools.
- Instead, a VBA Macro to solve it - the macro should work all the way back to Excel 2007
- Instead of doing VLOOKUP, do a series of Find & Replace with VBA
Learn Excel from MrExcel, Podcast Episode 2151.
I really don't know what to call this one. If I'm trying to attract the people who use DAX, I would say ConcatenateX in Power Query, or just the people who use regular Excel but Office 365, I’d say TEXTJOIN in Power Query, or, to be completely honest, it's a super complex set of steps in Power Query to enable a super-insane solution in Excel.
Hey. Welcome back to the MrExcel netcast. I'm Bill Jelen. Well, yesterday in episode 2150, I described the problem. Someone sent in this file where their system is downloading the items that are an order with linefeeds between them. In other words, ALT+ENTER, and see, WRAP TEXT is turned on, and they want to do a VLOOKUP into this LOOKUPTABLE for each of those items. I’m like, what? Why are you doing this? But I covered that yesterday. Let's just try and figure out how to do this.
I actually said, well, Power Query would be the best way to do this but I got stumped on how to do the last part. I said, is it okay if each item ends up at its own row? No, they have to be back in this original sequence. I’m like, that's horrible, but, on my Twitter feed just last week, Tim Rodman, September 27th: “Finally reading this book,” -- I'm guessing it's PowerPivot Alchemy -- “and MrExcel already got his ConcatenateX wish.” I was being a smartass when I did this, asking for PERHAPS ROMANX, but I probably really did want ConcatenateX, and so Tim gave me a heads up that I can now do that in Power BI.
So, I went out to my friends, Rob Collie at Power Pivot Pro and Miguel Escobar, and, you know, they're both authors of great books. I have both of these books, but this feature is too new, not in either book. I said, hey, do you guys know how to do this? And Miguel wins the prize because Miguel was up early this morning or late last night -- I'm not sure which one -- and sent in the code.
Alright, so, here’s the plan in Power Query and this one is so complicated. I never write out a plan in Power Query. I just go do the whole things. I'm going to start with the original data, add an INDEX column so we can keep the items from an order together, SPLIT COLUMN to ROWS using a LINEFEED. This is the second or third time on the podcast I've used this new feature. How cool is that. I had a second INDEX column so we can sort items into the original sequence, and then SAVE AS A CONNECTION.
Then, we’re going to come to the LOOKUP table, make it a table, query from table, SAVE AS CONNECTION -- that was going to be the easiest part right there -- and then merge this query and this query based on item number, all items from the left table, this is the left table, matching from the right, replace nulls with the item number. We're still up in the air on what we want to do when something's not found for some reason. I've asked this question, but the person who sent in the file isn't answering, so I'm just going to replace it with the item number. Hopefully the right thing to do is add more items to the LOOKUPTABLE so there aren't any not founds, but here we are, and then we're going to sort by INDEX1 and INDEX2, so that way, things are back in the right sequence and then this was the part that I could not figure out how to do.
We're going to group by INDEX1 doing the equivalent of a TEXTJOIN or ConcatenateX with the character 10 as the separator, as the aggregator, and, of course, this is the part that's the hard part but it's the part that's really new here in this set of steps. So, if you understand what TEXTJOIN does or can conceptualize what ConcatenateX would have done, we're essentially doing that using this sort of step. So, alright. So, let’s give it a shot.
So, we're going to start here. Here's our original data, has a heading. So, I'm going to FORMAT AS TABLE, CONTROL+T, MY TABLE HAS HEADERS, yes, and then we're going to use Power Query. Now, I'm in Excel 2016 Office 365, so it's over here on the left part of the DATA tab. If you're just in straight Excel 2016, not Office 365, it's in the middle -- GET & TRANSFORM. If you're in Excel 2010 or 2013, it's going to be its own tab out here called Power Query, and if you don't have that tab, you’re going to have to download that tab. If you're on a Mac or Android or any of the other fake versions of Excel, sorry, no Power Query for you. Get a Windows version of Excel and give this a try.
Alright, so, we're going to do a Power Query FROM A TABLE, alright, and the first thing I'm going to do is I'm going to ADD an INDEX COLUMN and I'm going to start FROM 1. Alright, so, this is essentially order 1, order 2, order 3, order 4. Then we're going to choose this column and, on the TRANSFORM tab, we're going to SPLIT COLUMN, BY DELIMITER, and they were able to detect that it is a LineFeed is the delimiter. I love that Power Query is detecting this. Now, why doesn't Excel, text to columns, yeah, text to columns figure out what the delimiter is? And each occurrence we’re going to SPLIT INTO ROWS, and USING SPECIAL CHARACTER. Alright, so all of that’s good.
Now watch what happens here. We have 999 rows but now we have far more than that. So, every item in that order number is now its own row. Now, the person who asked this question doesn't want it to be its own row but we're going to have to make it be its own row so we can do the join. I'm going to add a new INDEX column here. ADD COLUMN, INDEX COLUMN, FROM 1, and so we have…these are essentially the order numbers and then these are the sequence within the order because I've determined that, later, these are going to be in some other order. I don't know what order they switch to but here we are.
Alright, so, HOME, not the CLOSE & LOAD button but the CLOSE & LOAD drop-down, and CLOSE & LOAD TO. I don't know why it takes 10 seconds for them to display this dialog box the first time. We're going to ONLY CREATE CONNECTION. Click OK. Beautiful. So that's TABLE1, TABLE1.
Now, we're going to go to our LOOKUPTABLE. LOOKUPTABLE is going to be easy to process. We're going to format this as a table. CONTROL+T. Click OK. DATA, or POWER QUERY if you’re in an old version, FROM TABLE. This is going to be called TABLE2. Let's call it LOOKUPTABLE. Perfect. CLOSE & LOAD, CLOSE & LOAD TO, ONLY CREATE CONNECTION.
Alright. Now, we have our two bits over here and I want to merge those two. So, we're just going to go to a new spot and then DATA, GET DATA, COMBINE QUERIES, we're going to do a MERGE, and the table on the left is going to be TABLE1 -- that's our original data -- and we're going to use this ITEM number and we're going to marry that up to the LOOKUPTABLE and that ITEM number. It’s really non-intuitive there you have to click on the ITEMS in both cases to define what the key is, and an OUTER join, ALL FROM FIRST, MATCHING FROM SECOND, and, see, there's 40% of these that are missing from the LOOKUPTABLE. This is all fake data but the original data had 40% missing from the LOOKUPTABLE as well. Really kind of frustrating. Alright. So, here's our ITEM number, our 2 INDEX fields, and then our LOOKUPTABLE here. I'm going to EXPAND that and ask for the DESCRIPTION. Alright, you see we have a bunch of nulls here.
Alright, so, we're going to do a conditional column. Conditional column’s going to say look at this column. If it's = to null, then bring this value over, otherwise, use the value that's in that column. So here, under ADD COLUMN, we’ll do CONDITIONAL COLUMN -- nice little UI that'll walk us through this -- if the LOOKUPTABLEDESCRIPTION EQUALS NULL, then we want to use a COLUMN here of ITEMS, otherwise, we want to use the COLUMN called LOOKUPDESCRIPTION, alright. Click OK, and there we are. There's our CUSTOM column with either the new value from the LOOKUPTABLE or the original value if it's not found. At this point, we can right click and say that we want to REMOVE this column. It was a temporary column, it was a helper column. Now that we have what we need, we don't need that column anymore, and actually, at this point, I don't need this column anymore either. So, I can right click and REMOVE that column. Alright. Now, we have our data here. I want to sort it by the original INDEX. So, SORT ASCENDING. That gets our data into the right sequence, and now that it's sorted, I can actually right click and REMOVE that column.
Alright. Now we’re at the point where, for every item, each order number -- so, this is order number 1, let's say -- I want to have these 4 items separated by a LineFeed character. Now, what I was hoping to be able to do was to come here to TRASNFORM. Instead, if we wanted to GROUP BY and that there'd be some magic here in the GROUP BY, I would say I'm going to concatenate or textjoin all those things, but it doesn't work, alright?
So, here's the set of steps that are new to me that allow this to happen. First thing we want to do is we're going to create a brand new column. That column is just going to be called a TABLECOLUMN and we're going to take ALL ROWS and click OK. Okay. So, when we look inside at this table, we see that we have 2 columns -- one called INDEX and one called CUSTOM -- and we have to remember that name there, alright, and this table unfortunately does not work with STRUCTURED COLUMN. See, EXTRACT VALUES is grayed out. So, this doesn't work with a table. It has to work with a list. I have to convert this table to a list, and this is the part I couldn't figure out and the part that Miguel filled in for me.
So, I'm going to create a CUSTOM COLUMN here and I'm going to call it a LISTCOLUMN and we're going to use a function called TABLE.COLUMN and the table is the thing called TABLECOLUMN, and then which column in there is the thing called CUSTOM. Click OK. Alright, and now these are, instead of a table, it is a list. We're home-free now. TRANSFORM, STRUCTURED COLUMN. I'm going to EXTRACT VALUES. I'm going to create a CUSTOM delimiter USING SPECIAL CHARACTERS, INSERT SPECIAL CHARACTER, LINE FEED, and click OK, and it gives me what I'm looking for. So here’s my original order number. The TABLE, we don’t need anymore, right click and REMOVE that, and we now have our original data using the LOOKUPTABLE where we need it, alright? So, I can right-click and REMOVE this, alright, and then finally, HOME, just straight CLOSE & LOAD, which brings it back into a table in Excel. [=Table.Column([TableColumn],“Custom”)]
Alright, but it doesn't look like it worked, does it? That's because, by default, this table does not have WRAP TEXT turned on. So, HOME, WRAP TEXT, and we now have our new data doing the equivalent of a VLOOKUP for each item in the list, and when an item is not found, the original item number is still there, so someone can go piece that back together.
Now, the beautiful thing with Power Query is that while it took us some time to get this set up the first time, the next time we download this list, we just copy it here, and we can even edit something. So, let’s change one. So, MANGO, 4954, we’ll take that 7036 and change it to 4954. Alright, so, now the underlying data has changed, all we have to do is come here to this and click the REFRESH which will refresh all of these items, and we get here to SHEET11 and that second item has changed to a MANGO, alright? It’ll take you some time to set this up once but, once you get it set up, it's just a simple matter of refreshing the data and Power Query will go through all those steps.
Now, hey, this is the point where I usually ask you to go buy my book but, today, let's instead ask you to go buy Miguel's book. Miguel Escobar and Ken Puls wrote this excellent book on M Is For (DATA) MONKEY -- the best book there is on Power Query. Go check that out.
Alright, wrap up: today is a really long episode; we have a viewer, downloads data from a system where each item is separated by ALT+ENTER and we're trying to do a VLOOKUP for each individual item; built a solution today using Power Query including the structured column tool of extract as; but that only works on a list, not a table, so I had to use the TABLE.COLUMN function to convert the table to a list.
Well, hey. I want to thank you for stopping by. We'll see you next time for another netcast from MrExcel.
Download the sample file here: Podcast2151.xlsm
Title Photo: Pixabay