Quick Source Change?

MickeyPvX

New Member
Joined
Feb 14, 2011
Messages
45
Hey everybody! Been a while since I've been on here, and the first time on the Power BI section.

Anyways, I've been dabbling with Power Query to pull NCAA Basketball data off of a stats site to mess around with. However, the way the site is set up, the columns I want to add to my large table are all at different URLs, so I've built like 50 queries all pointing to different URLs to pull the data off the tables at each one. I'm pretty new to Power Query, so I have two questions:

1. Is there a better way to do that, i.e. can I build one query pointing to all the different sources and pull the data as I need it? Right now I have another query that merges all the individual ones as a solution.

2. Now I want to pull data from another year, which is another 50-ish URLs, but I really would rather not brute-force it like the first time. So I guess this will be easier if the answer to Question 1 is, "Yep, and here's how you do it..." Originally I was thinking about writing some sort of script to change all the source URLs from the original queries.

Any help is appreciated, let me know if I can provide any more helpful information.
 

Some videos you may like

Excel Facts

Which came first: VisiCalc or Lotus 1-2-3?
Dan Bricklin and Bob Frankston debuted VisiCalc in 1979 as a Visible Calculator. Lotus 1-2-3 debuted in the early 1980's, from Mitch Kapor.

scottsen

Well-known Member
Joined
Mar 16, 2014
Messages
1,263
So, the short answer is "Yes, you can do that by writing a function which pull 1 page... then calling the function multiple times". The devil will be in the details.

You are about to write some custom M script, so... maybe go buy Ken Puls new book :)
 

MickeyPvX

New Member
Joined
Feb 14, 2011
Messages
45
So, the short answer is "Yes, you can do that by writing a function which pull 1 page... then calling the function multiple times". The devil will be in the details.

You are about to write some custom M script, so... maybe go buy Ken Puls new book :)
Ah, is that the, "M is for (Data) Monkey," book? Looks like they've got a deal on it on Amazon Prime. In the meantime, I'll keep working on it to see if I can make any headway. Thanks!
 

ImkeF

Well-known Member
Joined
Feb 1, 2015
Messages
544
Normally I'd expect These URL to contain some strings that are the same and only some parts to be different. If you split them up in different columns in Excel and concatenate them (in Excel or even in Power Query, both using "&") it would be sufficient just to change the variable parts.
 

MickeyPvX

New Member
Joined
Feb 14, 2011
Messages
45
Normally I'd expect These URL to contain some strings that are the same and only some parts to be different. If you split them up in different columns in Excel and concatenate them (in Excel or even in Power Query, both using "&") it would be sufficient just to change the variable parts.
It's proving a bit more difficult than the example I linked to earlier; in his example he's got all the columns he wants but just multiple pages of the same columns at different URLs. In my case, every page I go to should add another column to the table, with the Team name being the common one. So I kind of want to have something that does a buttload of merges based on what data it pulls from the sites.

Granted, the example gave me a good place to start, I think, but I'm still not figuring out how to get M to do what I want with the list of URLs I've given it. I got it to pull the data correctly, now I just need it to put it in the right order. That part's harder :/
 

MickeyPvX

New Member
Joined
Feb 14, 2011
Messages
45
OK, I've figured out a way to do it, although it still has a bit of brute-force, "copy paste code," in the final data pull query.

I created a function based on the one in the example, but added on a bunch of nested joins to merge all the sources together. I tried for a while to figure out how to get it to run through each one like the sources, but just gave up on it. Here's some code, any input is welcome:

Code:
(Page as text) as table =>
let
    Source = Web.Page(Web.Contents("https://www.teamrankings.com/ncaa-basketball/stat/" & Page)),
    Data0 = Source{0}[Data],
    ColumnsNeeded = Table.SelectColumns(Data0,{"Team", "2013"}),
    Rename = Table.RenameColumns(ColumnsNeeded,{{"2013", Page}})
in
    Rename
This is my function I call over and over again in the next step, a combination of managing the page names and then joining all the tables together. This is where it gets clunky:

Code:
let
Source = //list of all the pages I'm looking for...

ToTable = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),

Data = Table.AddColumn(ToTable, "DataPull", each GetData([Column1])),

Join1 = Table.NestedJoin(Data{0}[DataPull],{"Team"},Data{1}[DataPull],{"Team"},"NewCol",JoinKind.Inner),
    Expand1 =  Table.ExpandTableColumn(Join1, "NewCol", {Source{1}}, {Source{1}}),
Join2 = Table.NestedJoin(Expand1,{"Team"},Data{2}[DataPull],{"Team"},"NewCol",JoinKind.Inner),
    Expand2 =  Table.ExpandTableColumn(Join2, "NewCol", {Source{2}}, {Source{2}}),

//etc. etc. clunkiness...

Join53 = Table.NestedJoin(Expand52,{"Team"},Data{53}[DataPull],{"Team"},"NewCol",JoinKind.Inner),
    Expand53 =  Table.ExpandTableColumn(Join53, "NewCol", {Source{53}}, {Source{53}}),

RPIJoin1 = Table.NestedJoin(Expand53,{"Team"},FormatRPI{0}[Custom],{"Team"},"NewCol",JoinKind.Inner),
    RPIExpand1 = Table.ExpandTableColumn(RPIJoin1, "NewCol", {"rpi-rating-by-team"}, {"rpi-rating-by-team"}),
RPIJoin2 = Table.NestedJoin(RPIExpand1,{"Team"},FormatRPI{1}[Custom],{"Team"},"NewCol",JoinKind.Inner),
    RPIExpand2 = Table.ExpandTableColumn(RPIJoin2, "NewCol", {"sos-rpi-rating-by-team"}, {"sos-rpi-rating-by-team"}),

TrimPercent = //take the "%" off of the percentages...

ToNumber = //large chunk for converting all the data to numbers...

OrderColumns = //chunk for putting the columns where I want them

in
    OrderColumns
Yeah, 55 nested joins. There's another function I built called FormatRPI because the format of those tables is a little different that the other 53 attributes I'm looking for. It does the same thing as the GetData function, just has a couple more formatting steps in it. Anyways, now that it's built all I have to do is go back into the functions I'm calling and modify the URLs to point them towards whatever year I want to pull. Hooray!
 

ImkeF

Well-known Member
Joined
Feb 1, 2015
Messages
544
well done!
From your description I would have suggested to do a pivot in order to create the additional columns - but really hard to tell without looking at the actual data.
 

Watch MrExcel Video

Forum statistics

Threads
1,095,149
Messages
5,442,659
Members
405,191
Latest member
wedloski

This Week's Hot Topics

  • Copy entire row if CountA <>0 to another sheet
    [B]I want to copy entire row if CountA <>0 for column J7:AM7 (headers on J6:AM6) and so on till the last used cell is column D and paste the...
  • Select last used Row in Table
    I have created a Table in a Worksheet which is locked to prevent user errors and protect formula. Some of the cells require freetext entries which...
  • excel workbook: do not allow certain file name
    Hello all, Don't think this has ever been asked before, but how do I restrict file save [Before_Save Event] if the name of the file being saved...
  • fixing problem autofilter
    hello i need help about my code when i search by code in textbox it doesn't show anything this is my data [ATTACH type="full"...
  • “Weight”
    Hi, i’ve got a long sheet filled with weights such as kg,g,L & ml. i can build a formula to convert kg into g and liter into ml. How ever, my...
  • How to capitalize everything before a certain character?
    In column A, I have some text: Hello good day.mp3 Hello good day.flac etc. I'd like to capitalize everything before the period. I don't need the...
Top