Power Query to transform data ready for pivot

kcmuppet

Active Member
Joined
Nov 2, 2005
Messages
437
Office Version
  1. 365
Platform
  1. Windows
Hello, I'm trying to set-up a power query to transform my data ready to pivot. My starting table has data in this shape/layout:


UserQ1QuestionQ1CategoryQ1AnswerQ2QuestionQ2CategoryQ2AnswerQ3QuestionQ3CategoryQ3AnswerQ4QuestionQ4CategoryQ4AnswerQ58QuestionQ58CategoryQ58Answer
User 1Age?PhysicalHighHeight?PhysicalLowShoe Size?PhysicalMediumPreferred food?PreferenceHighPreferred Colour?PreferenceHigh
User 2Age?PhysicalHighHeight?PhysicalHighShoe Size?PhysicalMediumPreferred food?PreferenceLowPreferred Colour?PreferenceLow
User 3Age?PhysicalMediumHeight?PhysicalHighShoe Size?PhysicalMediumPreferred food?PreferenceLowPreferred Colour?PreferenceLow
User 4Age?PhysicalMediumHeight?PhysicalHighShoe Size?PhysicalHighPreferred food?PreferenceLowPreferred Colour?PreferenceMedium
User 5Age?PhysicalMediumHeight?PhysicalHighShoe Size?PhysicalHighPreferred food?PreferenceMediumPreferred Colour?PreferenceMedium
User 6Age?PhysicalLowHeight?PhysicalMediumShoe Size?PhysicalLowPreferred food?PreferenceLowPreferred Colour?PreferenceMedium


...and I'm trying to get into into a format that will allow me to generate a a pivot table that will be in this format:

Final Output (Excel pivot table)
HighMediumLow
Physical
Age?231
Height?411
Shoe Size?231
Preference
Preferred food?114
Preferred Colour?132


i.e. I want to group by 'Category' and then by 'Question' and count the 'Answer' responses. I have thousands of rows and 58x Question, Question Category and Answer and the fields are not contiguous. It seems to me that I need to find a way to first stack the entries in all the fields ending 'Category', and 'Answer', but I'm struggling to know how to start?

Here is a mock-up of the starting table:

Power Query:
let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLVIwVNJRckxPtQdSARmVxZnJiTlApkdmegaISgXSJWhyPvnlQDI4Iz9VITizCl2nb2pKZmkuSKwoNS21qCg1RSEtPz/FHi6SmpecirABoco5Pye/tAi7ulgdqGuNSHYtVI5S50I8TdC1IGVwxxpjdSzcxsHmXBPqOxcjkilxLNQlcPeaDpR7sUQCkU42w+pkiP+xuhduFS4Xo4cdNQI4FgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, Q1Question = _t, Q1Category = _t, Q1Answer = _t, Q2Question = _t, Q2Category = _t, Q2Answer = _t, Q3Question = _t, Q3Category = _t, Q3Answer = _t, Q4Question = _t, Q4Category = _t, Q4Answer = _t, Q58Question = _t, Q58Category = _t, Q58Answer = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"Q1Question", type text}, {"Q1Category", type text}, {"Q1Answer", type text}, {"Q2Question", type text}, {"Q2Category", type text}, {"Q2Answer", type text}, {"Q3Question", type text}, {"Q3Category", type text}, {"Q3Answer", type text}, {"Q4Question", type text}, {"Q4Category", type text}, {"Q4Answer", type text}, {"Q58Question", type text}, {"Q58Category", type text}, {"Q58Answer", type text}})
in
    #"Changed Type"

(cross-posted on the MS powerbi forum)
 
But doesn't the grouping still come out correct?

{
{Q10Question, Q10Category, Q10Answer},
{Q1Question, Q1Category, Q1Answer}
...}

if so you can sort the table at then end by the Question No. column to get the order corrected
 
Upvote 0

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.
But doesn't the grouping still come out correct?

{
{Q10Question, Q10Category, Q10Answer},
{Q1Question, Q1Category, Q1Answer}
...}

if so you can sort the table at then end by the Question No. column to get the order corrected
No, unfortunately it doesn't. It goes:

Q1Question,Q1Category,Q3Question
Q3Question,Q3Category,Q4Question
Q4Question,Q4Category,Q5Question
Q5Question,Q5Category,Q6Question
Q6Question,Q6Category,Q7Question
Q7Question,Q7Category,Q8Question
Q8Question,Q8Category,Q9Question
Q9Question,Q9Category,Q10Question
Q10Question,Q10Category,Q11Question
Q11Question,Q11Category,Q1Answer
Q3Answer,Q4Answer,Q5Answer
Q4Answer,Q5Answer,Q6Answer
Q5Answer,Q6Answer,Q7Answer
Q6Answer,Q7Answer,Q8Answer
Q7Answer,Q8Answer,Q9Answer
Q8Answer,Q9Answer,Q10Answer
Q9Answer,Q10Answer,Q11Answer

etc. to Q58 because the actual data layout wasn't exactly as in the original post. It's more like this, except with 58 Question types and 2 answer types:

Power Query:
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLVIwVNJRckxPtQdSARmVxZnJiTlApkdqZnpGCZpgcEZ+qkJwZhW64oCi1LTUoqLUFIW0/PwUe7hIal5yKoq0c35OfmkRhgIPoF1Ayie/HEj6pqZkluYiRFEkUcVQlcbqQH1kNGh8hO5OiCeweQXN7yAu3D/GA+8f9EjB5yM0Oex+Mhl8fsKbCtE8hSoK95XpoPUVXBinx3BIwP1mNvB+w5KgMASxyqKoiY0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, Q1Question = _t, Q1Category = _t, Q2Question = _t, Q2Category = _t, Q3Question = _t, Q3Category = _t, Q4Question = _t, Q4Category = _t, Q58Question = _t, Q58Category = _t, Q1ORAnswer = _t, Q2ORAnswer = _t, Q3ORAnswer = _t, Q4ORAnswer = _t, Q58ORAnswer = _t, Q1CRAnswer = _t, Q2CRAnswer = _t, Q3CRAnswer = _t, Q4CRAnswer = _t, Q58CRAnswer = _t]),
 
Last edited:
Upvote 0
Did you change the tcnQ step to sort as suggested in post #9?

Power Query:
tcnQ = List.Sort(List.Select(tcn, each Text.Start(_,1)="Q")),

if you did there is no way the order is coming out like

Q1Question, Q1Category, Q2Question
Q2Category, Q3Question, Q3Category
Q4Question, Q4Category,Q58Question
Q58Category,Q1Answer,Q2Answer
Q3Answer,Q4Answer,Q58Answer
 
Upvote 0
Did you change the tcnQ step to sort as suggested in post #9?
Power Query:
tcnQ = List.Sort(List.Select(tcn, each Text.Start(_,1)="Q")),


You're right, I made a mistake with that line! (I simply put List.Select(tcn, each Text.Start(_,1)="Q")),🫢

Thank you for your patience.

The working version is like this

Power Query:
let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLVIwVNJRckxPtQdSARmVxZnJiTlApkdqZnpGCZpgcEZ+qkJwZhW64oCi1LTUoqLUFIW0/PwUe7hIal5yKoq0c35OfmkRhgIPoF1Ayie/HEj6pqZkluYiRFEkUcVQlcbqQH1kNGh8hO5OiCeweQXN7yAu3D/GA+8f9EjB5yM0Oex+Mhl8fsKbCtE8hSoK95XpoPUVXBinx3BIwP1mNvB+w5KgMASxyqKoiY0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, Q1Question = _t, Q1Category = _t, Q2Question = _t, Q2Category = _t, Q3Question = _t, Q3Category = _t, Q4Question = _t, Q4Category = _t, Q58Question = _t, Q58Category = _t, Q1ORAnswer = _t, Q2ORAnswer = _t, Q3ORAnswer = _t, Q4ORAnswer = _t, Q58ORAnswer = _t, Q1CRAnswer = _t, Q2CRAnswer = _t, Q3CRAnswer = _t, Q4CRAnswer = _t, Q58CRAnswer = _t]),
    #"Renamed Columns" = Table.RenameColumns(Source,{{"Q4Question", "Q10Question"}, {"Q4Category", "Q10Category"}, {"Q4ORAnswer", "Q10ORAnswer"}, {"Q4CRAnswer", "Q10CRAnswer"}}),
    tcn = Table.ColumnNames(#"Renamed Columns"),
    tcnQ = List.Sort(List.Select(tcn, each Text.Start(_,1)="Q")),
    tcngroup = List.Accumulate({0..List.Count(tcnQ)/4 -1 }, {}, (s,c)=> s & {List.Range(tcnQ, c*4, 4)}),
    tbl = List.Accumulate(tcngroup, #"Renamed Columns", (s,c)=> Table.CombineColumns(s, c, Combiner.CombineTextByDelimiter(";", QuoteStyle.None), c{0})),
    tbl1 = Table.UnpivotOtherColumns(tbl, {"User"}, "Question No.", "Value"),
    tbl2 = Table.SplitColumn(tbl1, "Value", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"CRAnswer", "Category", "ORAnswer","Question"}),
    digits = List.Transform({0..9}, Text.From),
    Result = Table.TransformColumns(tbl2, {"Question No.", each Number.From(Text.Select(_, digits))}),
    #"Reordered Columns" = Table.ReorderColumns(Result,{"User", "Question No.", "Category", "Question", "CRAnswer", "ORAnswer"})
in
    #"Reordered Columns"
 
Upvote 0
@JGordon11 is there a way to use List.Buffer to speed it up? (with my actual data it's taking 90 minutes to run the query on a table with 2,600 rows). I've been reading around on List.Buffer but can't work out where to put it in your M code.
 
Upvote 0
I extended your sample data to 5000 rows and this code runs in a few seconds. Not sure how slow it will be on your real data that has a lot more columns. I buffered the source table (i.e. buffer whatever table you reference in the tbl step) and the tcngroup list. I changed the tcngroup step to a more efficient function (List.Split).

Power Query:
let
    Source = Table.Buffer(Excel.CurrentWorkbook(){[Name = "Table1"]}[Content]),
    tcn = Table.ColumnNames(Source),
    tcnQ = List.Sort(List.Select(tcn, each Text.Start(_,1)="Q")),
    tcngroup = List.Buffer(List.Split(tcnQ,4)),
    tbl = List.Accumulate(tcngroup, Source, (s,c)=> Table.CombineColumns(s, c, Combiner.CombineTextByDelimiter(";", QuoteStyle.None), c{0})),
    tbl1 = Table.UnpivotOtherColumns(tbl, {"User"}, "Question No.", "Value"),
    tbl2 = Table.SplitColumn(tbl1, "Value", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"CRAnswer", "Category", "ORAnswer","Question"}),
    digits = List.Transform({0..9}, Text.From),
    Result = Table.TransformColumns(tbl2, {"Question No.", each Number.From(Text.Select(_, digits))}),
    #"Reordered Columns" = Table.ReorderColumns(Result,{"User", "Question No.", "Category", "Question", "CRAnswer", "ORAnswer"})
in
    #"Reordered Columns"
 
Upvote 0
Solution
I don't think you would buffer these sources as they are Excel - they don't query fold.
 
Upvote 0
I extended your sample data to 5000 rows and this code runs in a few seconds. Not sure how slow it will be on your real data that has a lot more columns. I buffered the source table (i.e. buffer whatever table you reference in the tbl step) and the tcngroup list. I changed the tcngroup step to a more efficient function (List.Split).
Thanks - this approach reduced the refresh time with my data set from 90 minutes to 30 minutes (2,600 rows, 400 columns)
 
Upvote 0

Forum statistics

Threads
1,215,098
Messages
6,123,082
Members
449,094
Latest member
mystic19

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top