Some Guidance please

JeffGrant

Well-known Member
Joined
Apr 7, 2021
Messages
519
Office Version
  1. 365
Platform
  1. Windows
I have a java API that downloads and combines csv files from a data site, ready for import into excel.
Basically, the API gets a particular file, then loops through all of the lines in the file to pull down the relevant data for the day.

The data suppler also supplies the data as json files and they are making noises about stopping the supply of the csv files.

With PQ, i can see there are many videos about pulling a single data file with a known URL, but i have not seen anything that would effectively loop through an input list to create the URL as an input variable.

Is this possible?

I am a real novice with PQ, and a little above novice with VBA, so I can grab the jist of it without claiming be a professional. so I do understand the technical parts of what i am asking here.

I look forward to hearing from your guys soon.
 

Excel Facts

Show numbers in thousands?
Use a custom number format of #,##0,K. Each comma after the final 0 will divide the displayed number by another thousand
Certainly much easier with JSON data. I don't know the exact URL pattern you have, but it should be taking an index number, or maybe an id, or a string as a parameter. The following sample basically shows how I would get data for each URL that I have (or generated as in the sample), add it as a new column, and then transform it. Look at each step to see the details.

This is just to give an idea. In the real world, it wouldn't be very efficient to make a single call for each entity for hundreds of rows, but it depends on the data size, and how it is served by the API.

Power Query:
let
    // Create a list of index numbers to be used as the query value in the API URL
    Index = {1..10},

    // Generate a list of URLs from the index numbers 
    URLs = List.Transform(Index, each "https://jsonplaceholder.typicode.com/posts/" & Text.From(_)),

    // Create a table from the URL list
    Table = Table.FromList(URLs, Splitter.SplitByNothing(), {"URL"}),

    // Add a new column as the returned value from the API for each URL
    GetData = Table.AddColumn(Table, "Data", each Json.Document(Web.Contents([URL]))),

    // Transform the returned record as needed
    Result = Table.ExpandRecordColumn(GetData, "Data", {"userId", "id", "title", "body"})
in
    Result
 
Upvote 0
Solution
Hi Smozgur, thanks for your contribution, perhaps I need to drop my skill level from Novice to Novis - :) because this is above my skill set.

I do appreciate this comment...
In the real world, it wouldn't be very efficient to make a single call for each entity for hundreds of rows.

when I look at the source java code of API, the URL calls are not individual or loops (*), and I guess that is where I am getting lost.

Essentially, there are 5 only main url calls. of those 5, 3 pull data that has less than 30 lines of data in the single csv file as an absolute maximum.

The forth file could easily be up to a 100 rows, so that could be up to 100 separate url calls and the (*) 5th is well and truley up in the thousands of rows of data, but that is based upon how urls calls are made based on the contents of the 4th file.

I suspect this is why the data supplier is pushing towards json data. It would be much easier to manupulate and add on to, where the csv files are a cumbersom brick.

I understand that almost everybody tells me that json is much easier than csv, but I was was born and bread on csv so it is very easy for me to understand and manipulate.

Perhaps i need to head over to YouTube and put my learning cap on :)
 
Upvote 0
Sorry, I thought a simple example would give an idea about looping through the URLs to download each page individually as a new column but I even didn't explain how to use it. You just need to copy and paste that code into a blank query (in the Advanced Editor in the Power Query window). Then you'll close the Advanced Editor and get the data (you might need to allow connection if you get a credentials error or message).

I was was born and bread on csv so it is very easy for me to understand and manipulate.
We are all the same! :)
The good news is JSON is certainly 90% easier than CSV (in Power Query, not Excel or VBA). It is more or less a raw database actually, and once you get the tabular data, Power Query does the rest.
The web service (API) is also very important in providing a well-designed data structure of course. As I mentioned, trying to access 100s of URLs to fetch data is not the best way, instead, APIs usually return data in a paginated structure to allow client applications to download all at once which will avoid many individual calls. This is really not like our days in the past when we plan and build everything as we need but simply depend on what and how those web services return now.

Power Query is very good at transforming structured data, and I am sure you'll love it once you get into it. Try importing the URLs by using the Get Data -> From Web option, you'll get data beautifully aligned as list items that you can then use the Power Query UI to apply some transformation. At some point, you would need to edit the M code in the background because the UI won't be enough, but even before that, you'll see its power and enjoy discovering commands and methods. You'll be able to join the different data sources with foreign IDs and create all kinds of reports that you'll need in Excel.

I wish we could see more details to help but I believe those are sensitive files and URLs shouldn't be in public. Still, you can get to some point (and I believe it won't take so long), then ask for help when/if you are stuck somewhere. There are lots of great Power Query members on the board who'd like to help.
 
Upvote 0
Hi Smozgur,

I have spent last night and today studying the use of Lists & JSON Files in PQ. Expanding on your example, I feel I can now ask the RIGHT questions.

1. I have a query that pulls a table everyday that looks like this:
TrackCount
Ballina
7​
Cairns
7​
Mornington
8​
Murray Bridge GH
8​
Warwick Farm
8​

2. I have a function, that pulls data from a URL, which relies on the above input table like this:

myURL = "DefaultPath/Balina/" & Number.ToText(RaceNumber) & "/" & Date & "?ApiKey=12345",

This function asks me to input the variable RaceNumber, and where the URL has the string text "Balina" which is hard coded. This is all good because I can changed that manually.

What I don't understand, and this is a bit like a nested loop in other languages, in the URL, how do I set one variable to "Track", the other variable to RaceNumber",

then invoke the function with the External Loop is Track name, nested loop is RaceNumber, up to and including the value of Count.

So that the query pulls back data for:
Balina/1/, then Balina/2/....Balina/7/, then
Cairns/1/, then Cairns/2/...Cairns/7/ etc......

Then saves all of these individual data pulling instances as a single merged table (in your example, your were talking about columns, but I need rows)

Then, to top this off, the JAVA API I have, does all of this in memory and only saves the merged csv file, ready for an Excel Import.

I think I am on the right track here, but absolutely respect any advice and guidance you can give.


So basically the question is this, how can I do a nested loop in PQ? I cannot find a single video about nested loops in PQ on youtube :(

Thanks for your help
 
Upvote 0
There are two loop functions in M that you can think like the For...Next or Do...While functions in VBA. List.Generate() and List.Accumulate() functions.

However, you usually won't need these functions, because you'll see that transformation functions in M already work on row/item iteration. See. the List.Transform() and Table.AddColumn() functions in the code in post#2. See how "each" takes the current row and applies the necessary transformation to generate the result. That parameter of the AddColumn() function starting with "each" is actually a function that iterates ALL rows in the table, so it creates the new column with the values calculated in the generator function.

Of course, there are certain situations that you'll need to utilize List.Generate() and List.Accumulate() functions but not for this project so far. That's because the Table.AddColumn() function will do what you need at this point. And List.Generate() and List.Accumulate() functions might not be the best functions depending on the row size in a project. I don't recommend getting confused with it yet but if you are really curious, then you can see my Excel Article about List.Generate(): Power Query List.Generate vs. VBA For...Next.

Please copy and paste the following code into the Advanced Editor and see how it works step by step to generate the URL column that could be used in the next step of this process. (I used the sample data as a base64 encoded text value in the following sample to make it easy to see the result quickly before adapting it to your own table. ) I tried to explain each line in the code. By the way, this could be all done by using the ribbon command on the interface.
Power Query:
let
   // The following is the sample data, remove it when you are ready to work with your own table,
   // and enable the next Tracks line to work with your own data
    Tracks = Table.FromRows(
                Json.Document(
                    Binary.Decompress(
                        Binary.FromText("i45WckrMycnMS1TSUTJXitWJVnJOzCzKK4ZzffOL8jLz0kvy84BCFhCh0qKixEoFp6LMlPRUBXcPuER4YlF5ZnK2gltiUS5EMBYA", BinaryEncoding.Base64),
                        Compression.Deflate)
                    ),
                {"Track", "Count"}
            ),
    // Tracks = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

    // Race numbers could be taken from another table. I am just creating a list of numbers and converting it to a table below
    RaceNumber = Table.FromList({1..10}, Splitter.SplitByNothing(), {"RaceNumber"}),

    // It is time to add the race numbers to the tracks table.
    // I am initially adding the entire race numbers table as a row value for each track
    // So, each tracks now has a cell contains a table consists of numbers starting from 1 to 10
    TracksWithRaceNumbers = Table.AddColumn(Tracks, "RaceNumber", each RaceNumber),

    // Now I can expand that column contains race number tables
    // The following function repeats each tracks row for each race number row
    ExpandRaceNumber = Table.ExpandTableColumn(TracksWithRaceNumbers, "RaceNumber", {"RaceNumber"}),

    // Finally the URLs that needs to be generated for each Tracks and RaceNumber.
    // So, if we have 5 tracks and 10 race numbers, it means we will have 50 unique URLs
    // This is the "loop" that you need
    GenerateURL = Table.AddColumn(ExpandRaceNumber, "Custom", each Text.Combine({"/", [Track], "/" , Text.From([RaceNumber]), "/"}))
in
    GenerateURL

Right, this is not like VBA, and I admit that I often find myself trying to create solutions in M as I would solve them in VBA. However, most of the time, there are much more effective functions in M that won't need the methods we would use in VBA. I just keep saying to myself "Remember, this is all about transforming data in tabular format, just a chapter in VBA or any other programming language". And I try to stay in M by thinking that there is absolutely a function or method that exists for whatever I am trying to do.

Note: This is not the entire solution you need, but just to explain the first step to generate URLs. Once you feel comfortable with this code and concept so far, we can continue with the next steps. I believe it would be more useful like this, step by step.
 
Upvote 0
Hi Smozgur,

works like a treat and very fast :). I am working on getting my head around the way the inbuilt functions work. Where I fall down a bit is that I was raised on Fortran 66 & 77 which is very highly structured language and the programmer had to account for EVERY step of the process. The way these "modules" work where we only hook into exiting code and pick up the result, without having to program each step in between is just so foreign to me. But I am getting there. :)

I understand how you have just added a stock list of race numbers to the track name for the example, so using this theory, would we create a separate list for each track so that when the race numbers are added to the track, they would all have their own list for the Table.FromList line rather than my crazy old thinking of a nested loop? Not trying to get ahead of the game, just trying to expand my knowledge of the structure of M.

I believe the next part of this learning curve would be to understand how to build the number of URLs based upon the Track Count ( I suspect)

Chat soon & Thank you very verymuch
 
Upvote 0
Where I fall down a bit is that I was raised on Fortran 66 & 77 which is very highly structured language and the programmer had to account for EVERY step of the process. The way these "modules" work where we only hook into exiting code and pick up the result, without having to program each step in between is just so foreign to me. But I am getting there. :)
I remember my first Fortran application. It should be part of my late 1980s! I can't believe the tools that we had back then and what we were able to accomplish, and now. Phew! :)

Anyway, time for back to the future. My approach would be loading the returned result into the new column cells for each URL, and then expanding those cells as necessary to form what I would need. I just don't think it would be as easy as in my initial sample code. It depends on the data files that the API returns.

But, I think the URL generation should be cleared first. I believe I understood the actual URL pattern wrong initially. Could you please give me a sample of URL, including track, race number, and also track count for only one track? When I saw you sample URLs, Balina/1/, then Balina/2/....Balina/7/, I thought these numbers were race numbers, but I now understand those are variations starting from 1 to the Track Count for each track. Where is the Race Number located in the URL in this case? Something like "/Balina/{RaceNumber}/{1 to TrackCount}/"? Or maybe it is a query parameter like "/Balina/{RaceNumber}/?tc={1 to TrackCount}"?
 
Upvote 0
Hi Smosgur

Please note that the full urls below give you access to my data download so you can see the whole picture.

Basically, the meeting list is fed into the other urls, because the data supplier stores each track and race as individual files.

I also download other csv files because my model manipulates the data in certain ways.

------

I am happy to send you the Java App (which is a windows exe file) that I have along with the source code, if that helps us out more......
 
Last edited by a moderator:
Upvote 0
Hi Jeff,

The links you posted contain the API key, and I am not sure if that's ok to publish that information since it will be publicly accessible. So, I removed the links from your post. Please let me know if that's ok to keep the links in public, then I can restore them.
 
Upvote 0

Forum statistics

Threads
1,215,963
Messages
6,127,960
Members
449,412
Latest member
montand

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top