Power Query web scrape

gargamalebarbosa

Board Regular
Joined
Aug 4, 2022
Messages
118
Office Version
  1. 365
Platform
  1. Windows
Hi ,
I want to retrieve fuel price archives from the URL below. There are 3 criteria. 1. City name. 2. District name. 3. Date range. After entering these criteria and clicking the search button, I want to fetch the resulting table using Power Query. Is this possible?

Thank you all.






1696281165848.png
 

Excel Facts

Shade all formula cells
To shade all formula cells: Home, Find & Select, Formulas to select all formulas. Then apply a light fill color.
The following is the API base URL and endpoint:
https://api.opet.com.tr/api/fuelprices/prices/archive

It takes the following parameters:
DistrictCode: 934015 (for Istanbul / Besiktas)
StartDate: 2023-09-02
EndDate: 2023-10-02
IncludeAllProducts=true (returns all product types)

So, the following URL returns prices as a JSON string:
https://api.opet.com.tr/api/fuelprices/prices/archive?DistrictCode=934015&StartDate=2023-09-02&EndDate=2023-10-02&IncludeAllProducts=true

At this point, you can use the following M code to get the data for the provided parameters (you will need to allow access to the URL when it asks for it).

Power Query:
let
    DistricCode = "934015", // Istanbul - Besiktas
    StartDate = #date(2023, 9, 2), 
    EndDate = #date(2023, 10, 2), 
    Source = Json.Document(
                Web.Contents(
                    "https://api.opet.com.tr/api/fuelprices/prices/archive" & 
                    "?DistrictCode=" & DistricCode & 
                    "&StartDate=" & Date.ToText(StartDate, "YYYY-MM-DD") & 
                    "&EndDate=" & Date.ToText(EndDate, "YYYY-MM-DD") & 
                    "&IncludeAllProducts=true"
                    )
                ),
    ConvertToTable = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    ExpandRecords = Table.ExpandRecordColumn(ConvertToTable, "Column1", {"prices"}),
    ExpandListValues = Table.ExpandListColumn(ExpandRecords, "prices"),
    ExpandPricesByProduct = Table.ExpandRecordColumn(ExpandListValues, "prices", {"priceDate", "productName", "amount"}),
    PivotColumns = Table.Pivot(ExpandPricesByProduct, List.Distinct(ExpandPricesByProduct[productName]), "productName", "amount"),
    ColumnNames = List.RemoveFirstN(Table.ColumnNames(PivotColumns), 1),
    ChangeColumnTypes = Table.TransformColumnTypes(PivotColumns,{{"priceDate", type datetimezone} & List.Transform(ColumnNames, each {_, type number})})
in
    ChangeColumnTypes

For the second part, you are going to need some VBA. However, this will require downloading all city and district codes related to each city (the API endpoints mentioned in the next paragraph can be used for that).

Another approach, which actually I would go that way, could be also using Power Query to fetch the city and district data (name and code columns) by using the same API and the following endpoints:
https://api.opet.com.tr/api/fuelprices/provinces : Returns all cities
https://api.opet.com.tr/api/fuelprices/provinces/934/districts : Returns Istanbul/Avrupa districts

Then loading these data into two tables in a separate worksheet and using data validation to access name columns from this tables (you will need INDIRECT function for the data validation list to do this). Need a named range to create data validation for cities, and another named range to load the districts of the selected city that will be read by a query in PQ and then find the corresponding code for the selected items and generate the URL in the first step. Of course city and district selection part will be ideally handled by VBA unless you'd like to refresh tables manually.

I suggest examining the first part to understand it first. The main problem is how to get the data properly from the web service that I already provided it as the first part above. Then you can try to implement the second part by using whichever method sounds easy for you (all VBA or PQ/little VBA). Honestly, this is kind of a project that wouldn't normally fit into a single question and I certainly can't promise as I am currently very busy, but I'll try to provide a PQ solution for the second part to complete this idea in this thread unless someone else shows up to provide another approach or take it from here or you complete it yourself before I can get back. If you do so, please share how you did that to help future readers.

Let us know if you have any issues with the code above.
 
Upvote 0
Hello, first of all, thank you very much for the quick response and detailed explanation. I'll conduct experiments and provide information about the results. I want you to know that I'm asking these types of questions just to improve my Power Query skills, not for any professional work. Thanks again. / Teşekkür eder sağlıklı günler dilerim.
 
Upvote 0
I tried the code, and it works very well.
Finally, what I want to do is to retrieve criteria from cell values. For example, F1 = start date, F2 = end date, F3 = District Code. If this can be organized, the issue will be fully resolved.
Thank you again,
 
Upvote 0
Here is the steps creating a VBA/PQ combined application that is retrieving data from web service (API).

  • Open a new workbook. Create two worksheets named "Prices" and "Settings".
  • Prices worksheet: Create the following names.
    A2: cityName
    B2: districtName
    C2: startDate
    D2: endDate
    Finally, including headers for visibility, you'll have the following range. (do not enter the values, they are only for illustration purposes below)
    Book1
    ABCD
    1CityDistrictStart DateEnd Date
    2SAKARYASERDİVAN9/2/202310/2/2023
    Prices
  • Open Power Query editor
    • Create a new blank query, open the Advanced Editor, copy and paste the following code, and save the query as Cities. This is the query retrieves the city names.
      Power Query:
      let
          Source = Json.Document(Web.Contents("https://api.opet.com.tr/api/fuelprices/provinces")),
          ConvertToTable = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
          ExpandRecords = Table.ExpandRecordColumn(ConvertToTable, "Column1", {"code", "name"}),
          ReorderColumns = Table.ReorderColumns(ExpandRecords,{"name", "code"})
      in
          ReorderColumns
    • Close & Load To... this query into the Settings worksheet, A1 cell. The following shows the beginning rows of the retrieved and transformed data in the Settings worksheet.
      Book1
      AB
      1namecode
      2ADANA1
      3ADIYAMAN2
      4AFYONKARAHİSAR3
      5AĞRI4
      6AKSARAY68
      7AMASYA5
      8ANKARA6
      9ANTALYA7
      10ARDAHAN75
      Settings
  • Go back to the Prices sheet.
    • Create a List type Data Validation for A2 (cityName) cell, and use the following formula in the Source field as the validation criteria. Using INDIRECT since we can't refer to a table directly as Data Validation list source.
      Excel Formula:
      =INDIRECT("Cities[name]")
      Skip B2 (districtName) for now.
  • Reopen Power Query
    • Create another query by using the following M code, name it as Districts.
      Power Query:
      let
          // Get the city name from the name range in the worksheet.
          CityName = Excel.CurrentWorkbook(){[Name="cityName"]}[Content],
          // Find the corresponding city id by merging the selected city name with the Cities query
          Merge = Table.NestedJoin(CityName, {"Column1"}, Cities, {"name"}, "Cities", JoinKind.Inner),
          City = Table.ExpandTableColumn(Merge, "Cities", {"code"}),
          CityCode = Text.From(City{0}[code]),
      
          // Retrieve district data for the selected city code and transform
          // Examine steps to understand the transformation
          DistrictData = Json.Document(Web.Contents("https://api.opet.com.tr/api/fuelprices/provinces/" & CityCode & "/districts")),
          ConvertToTable = Table.FromList(DistrictData, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
          ExpandRecords = Table.ExpandRecordColumn(ConvertToTable, "Column1", {"code", "name"}),
          ReorderColumns = Table.ReorderColumns(ExpandRecords,{"name", "code"})
      in
          ReorderColumns
    • Close & Load To... this new query into the Settings worksheet, D2 cell as shown below.
      1696361464968.png
    • You'll see the following Districts table is created next to the previously created Cities table:
      Book1
      ABCDEF
      1namecodenamecode
      2ADANA1ADAPAZARI054014
      3ADIYAMAN2AKYAZI054002
      4AFYONKARAHİSAR3ARİFİYE054003
      5AĞRI4ERENLER054016
      6AKSARAY68FERİZLİ054015
      7AMASYA5GEYVE054004
      8ANKARA6HENDEK054005
      9ANTALYA7KARAPÜRÇEK054013
      10ARDAHAN75KARASU054006
      11ARTVİN8KAYNARCA054007
      12AYDIN9KOCAALİ054008
      13BALIKESİR10MERKEZ054001
      14BARTIN74PAMUKOVA054009
      15BATMAN72SAPANCA054010
      16BAYBURT69SERDİVAN054017
      17BİLECİK11SÖĞÜTLÜ054012
      18BİNGÖL12TARAKLI054011
      19BİTLİS13
      Settings
  • It is time to create Data Validation for B2 cell in Prices worksheet.
    • Create another List type Data Validation for B2 (districtName) cell, and use the following formula in the Source field as the validation criteria. Again, you need to use INDIRECT function.
      Excel Formula:
      =INDIRECT("Districts[name]")
      Now, both cityName and districtName cells have a drop down that you can select city and district to retrieve prices.
      1696364030896.png
  • One last time, open Power Query
    • Create the final query that I have initially posted in my first reply by using the following code and name it as Prices. This one will be slightly different to get the district code from the Districts table based on the selection in B2 cell (districtName). You'll notice the first four lines that gets the district code from the Districts query to be used in the API endpoint.
      Power Query:
      let
          DistrictName = Excel.CurrentWorkbook(){[Name="districtName"]}[Content],
          Merge = Table.NestedJoin(DistrictName, {"Column1"}, Districts, {"name"}, "Districts", JoinKind.Inner),
          District = Table.ExpandTableColumn(Merge, "Districts", {"code"}),
          DistrictCode = Text.From(District{0}[code]),
      
          StartDate = Date.From(Excel.CurrentWorkbook(){[Name="startDate"]}[Content]{0}[Column1]),
          EndDate = Date.From(Excel.CurrentWorkbook(){[Name="endDate"]}[Content]{0}[Column1]),
          PriceData = Json.Document(
                      Web.Contents(
                          "https://api.opet.com.tr/api/fuelprices/prices/archive" &
                          "?DistrictCode=" & DistrictCode &
                          "&StartDate=" & Date.ToText(StartDate, "YYYY-MM-DD") &
                          "&EndDate=" & Date.ToText(EndDate, "YYYY-MM-DD") &
                          "&IncludeAllProducts=true"
                          )
                      ),
          ConvertToTable = Table.FromList(PriceData, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
          ExpandRecords = Table.ExpandRecordColumn(ConvertToTable, "Column1", {"prices"}),
          ExpandListValues = Table.ExpandListColumn(ExpandRecords, "prices"),
          ExpandPricesByProduct = Table.ExpandRecordColumn(ExpandListValues, "prices", {"priceDate", "productName", "amount"}),
          PivotColumns = Table.Pivot(ExpandPricesByProduct, List.Distinct(ExpandPricesByProduct[productName]), "productName", "amount"),
          ColumnNames = List.RemoveFirstN(Table.ColumnNames(PivotColumns), 1),
          ChangeColumnTypes = Table.TransformColumnTypes(PivotColumns,{{"priceDate", type datetimezone} & List.Transform(ColumnNames, each {_, type number})})
      in
          ChangeColumnTypes
    • Make sure your queries are exactly in the following order before the next step.

      1696362606340.png


    • Close and Load To... this query into A4 cell in the Prices sheet as shown below.
      1696361987503-png.99665
    • You'll see the following result:
      Book1
      ABCDEFGH
      1CityDistrictStart DateEnd Date
      2SAKARYASERDİVAN9/2/202310/2/2023
      3
      4priceDateKurşunsuz Benzin 95Motorin EcoForceMotorin UltraForceGazyağıFuel OilYüksek Kükürtlü Fuel OilKalorifer Yakıtı
      59/2/2023 0:0037.3438.4138.4534.0823.5720.2927.27
      69/7/2023 0:0037.3438.4138.4534.0824.5221.9428.62
      79/9/2023 0:0037.3938.4638.534.0824.5221.9428.62
      89/13/2023 0:0037.3940.4940.5336.2824.5221.9428.62
      99/14/2023 0:0039.0340.4940.5336.2824.5221.9428.62
      109/15/2023 0:0039.0840.5340.5736.2824.5221.9428.62
      119/16/2023 0:0039.0840.5340.5736.2825.1723.2929.97
      129/22/2023 0:0039.1440.5940.6336.2825.1723.2929.97
      139/27/2023 0:0037.240.5940.6336.2825.1222.5929.37
      149/29/2023 0:0037.2640.6540.6936.2825.1222.5929.37
      Prices
    • To preserve the column widths, select a cell in this new table, Data->Properties and uncheck the "Adjust column width" option.
      1696362219876.png
  • Almost done. Now it is time to use some VBA help to make this automated.
    • Go to VBA, Developer->Visual Basic (or Alt + F11).
    • Find the Sheet1 (Prices) worksheet in the Microsoft Excel Objects in the VBAProject window and copy and paste the following code into the sheet class module opened in the right pane:

    • VBA Code:
      Private Sub Worksheet_Change(ByVal Target As Range)
          If Not Application.Intersect(Target, ThisWorkbook.Names("cityName").RefersToRange) Is Nothing Then
              ThisWorkbook.Queries("Districts").Refresh
              ThisWorkbook.Names("districtName").RefersToRange.Value = ""
              With Me.ListObjects("Prices")
                  If Not .DataBodyRange Is Nothing Then
                      .DataBodyRange.Delete xlUp
                  End If
              End With
          ElseIf Not Application.Intersect(Target, ThisWorkbook.Names("districtName").RefersToRange) Is Nothing Or _
                  Not Application.Intersect(Target, ThisWorkbook.Names("startDate").RefersToRange) Is Nothing Or _
                  Not Application.Intersect(Target, ThisWorkbook.Names("endDate").RefersToRange) Is Nothing Then
              If Not IsEmpty(ThisWorkbook.Names("districtName").RefersToRange) Then
                  ThisWorkbook.Queries("Prices").Refresh
              End If
          End If
      End Sub
      This code will make sure refreshing the queries in a certain order on certain actions (changing the named cells in Prices worksheet).
      Note: This code could be optimized, I just didn't have chance to do that. The optimization won't change how it works but how it looks.
  • And testing time. Go back to Prices sheet and select a city in A2. It will clear the selected (if any) district name, load the selected city districts, and you can now select a district to retrieve prices. It will also update the prices table when you change the start/end date values.
Some important notes:
  1. Not all districts return data. So, you can see a warning if there is no data for the requested district. (I didn't customize it).
  2. You will need to select Always ignore Privacy Level settings in Query Options for this workbook (Data->Get Data->Query Options and Privacy section)
    1696362884092.png
  3. I recommend hiding the Settings worksheet since you have nothing to do with it.
  4. This could be easily changed to work with a Search button as well. In that case, you'll remove the Worksheet_Change event code that we created in the last step, and write your own code for Search button to refresh the Prices query. It will be a one liner as shown below:
    VBA Code:
    Sub SearchButton_Click()
        ThisWorkbook.Queries("Prices").Refresh
    End Sub
Let me know how it goes. Then I'll try to upload the full sample workbook once you got it working at your side.
(Memlekete sevgi ve selamlar.)
 

Attachments

  • 1696361987503.png
    1696361987503.png
    144.2 KB · Views: 44
Upvote 2
Solution
No matter how much I say thank you, it's not enough.
Thank you so much . / Çok teşekkür ederim, sağlıklı günler...
 
Upvote 0
@smozgur,

I followed the steps you wrote and created an enormous file.
Thank you.

Selamlar :)
You're welcome.

Actually, this could be done by downloading all city (already downloaded) and associated districts only once into the settings worksheet and getting the districtId easily instead of making multiple calls to the API. The only drawback might be the updated items in the districts data, however, I don't think there will be frequent updates on that list for this data set. So, this is another way of doing that - in fact, better way since each call to the API takes some time. However, I followed this method since it might be help future readers that works with more dynamic web data.

Glad to hear it helps.
(Teşekkürler)
 
Upvote 1
You have chosen the method would be the method, I would have preferred too.
It seems we have a lot to learn from you.
I'm going to bother you from time to time, I hope you don't mind.

(Sağlıcakla)
 
Upvote 0

Forum statistics

Threads
1,215,836
Messages
6,127,173
Members
449,368
Latest member
JayHo

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top