Golf majors; retreive data from Wikipedia articles

DRSteele

Well-known Member
Joined
Mar 31, 2015
Messages
2,640
Office Version
  1. 365
Platform
  1. Windows
Can someone please help me?

I am trying to use PowerQuery to create a table in Excel of all the golf majors. I wish to extract their dates from the Wikipedia articles for each tournament.

For example, from this web address [FONT=Verdana,Arial,Tahoma,Calibri,Geneva,sans-serif]https://en.wikipedia.org/wiki/1895_U.S._Open_(golf) I want the contents of the Dates field in the box on the upper right side, October 4 1895 in this case. Then 1896, then 1897 all the way to 2019. I reckon it's just some kind of parameter process but I can't seem to get it right.[/FONT]
 

Excel Facts

Can you AutoAverage in Excel?
There is a drop-down next to the AutoSum symbol. Open the drop-down to choose AVERAGE, COUNT, MAX, or MIN
Unless I'm missing something, that page does not contain the dates. It is important that I have a method of retreiving various pieces of data from scores of web pages, not just the dates.
 
Upvote 0
maybe something like this:

Code:
[SIZE=1]// Select
let
    Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    #"Invoked Custom Function" = Table.AddColumn(Source, "fnUSOpen", each fnUSOpen([Select])),
    #"Expanded fnUSOpen" = Table.ExpandTableColumn(#"Invoked Custom Function", "fnUSOpen", {"Column1", "Column2"}, {"Column1", "Column2"})
in
    #"Expanded fnUSOpen"

// fnUSOpen
let
    YEAR=(year)=>
let
    Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/"&(Text.From(year))&"_U.S._Open_(golf)")),
    Data0 = Source{0}[Data]
in
    Data0
in
    YEAR[/SIZE]

SelectSelectColumn1Column2
20182018Tournament informationTournament information
2018DatesJune 14–17, 2018
2018LocationShinnecock Hills, New York
40°53'38”N 72°26'24”W? / ?40.894°N 72.440°WCoordinates: 40°53'38”N 72°26'24”W? / ?40.894°N 72.440°W
2018Course(s)Shinnecock Hills Golf Club
2018Organized byUSGA
2018Tour(s)PGA Tour
European Tour
Japan Golf Tour
2018StatisticsStatistics
2018Par70
2018Length7,440 yards (6,800 m)
2018Field156 players, 67 after cut
2018Cut148 (+8)
2018Prize fund$12,000,000
€10,244,150
2018Winner's share$2,160,000
€1,843,947
2018ChampionChampion
2018Brooks KoepkaBrooks Koepka
2018281 (+1)281 (+1)
2018Location MapLocation Map
2018Shinnecock Hills
Location in the United States
Shinnecock Hills
Location in New York
Shinnecock Hills
Location in the United States
Shinnecock Hills
Location in New York
2018‹ 2017
2019 ›
‹ 2017
2019 ›
 
Last edited:
Upvote 0
Thanks Sandy. I had to alter and manipulate all kinds of things in the query, but I finally got 448 dates for golf's four major tournaments into one table. Not all years' webpages for each tourney are identical in shape and layout, so it took some doing.

It's still a mess from a formatting standpoint and dates before 1900 don't work in Excel. Certainly though this was much quicker than transcribing them all manually...wow. Much obliged.
 
Last edited:
Upvote 0
structure of the tables before 1987 is different than >=1987

SelectTitleLabelDetail
1897​
Tournament informationDatesSeptember 17, 1897
1897​
Tournament informationLocationWheaton, Illinois
1897​
Tournament informationCourse(s)Chicago Golf Club
1897​
Tournament informationOrganized byUSGA
1897​
Tournament informationFormatStroke play - 36 holes
1897​
StatisticsField35
1897​
StatisticsPrize fund$335
1897​
StatisticsWinner's share$150


SelectTitleLabelDetail
1996​
Tournament informationTournament information
1996​
DatesJune 13–16, 1996
1996​
LocationBloomfield Hills, Michigan
1996​
Course(s)Oakland Hills Country Club
South Course
1996​
Organized byUSGA
1996​
Tour(s)PGA Tour
1996​
StatisticsStatistics
1996​
Par70
1996​
Length6,974 yards (6,377 m)
1996​
Field156 players, 108 after cut
1996​
Cut148 (+8)
1996​
Prize fund$2.4 million
1996​
Winner's share$425,000
1996​
ChampionChampion
1996​
Steve JonesSteve Jones
1996​
278 (-2)278 (-2)
1996​
‹ 1995
1997 ›
‹ 1995
1997 ›
 
Last edited:
Upvote 0
After posting I realized structure of the tables are different but as I can see you almost solved the problem
You didn't say what you are tryin' to achieve so M-code is simply for each single year
I used DV List to choose each year so it's a bit different from what you need
 
Upvote 0
in single table for all years
here is just breakthrough between 1986 and 1987

1987Tournament informationTournament information
1987DatesJune 18–21, 1987
1987LocationSan Francisco, California
1987Course(s)Olympic Club, Lake Course
1987Organized byUSGA
1987Tour(s)PGA Tour
1987FormatStroke play
1987StatisticsStatistics
1987Par70
1987Length6,709 yards (6,135 m)
1987Field156 players, 77 after cut
1987Cut147 (+7)
1987Prize fund$825,000
1987Winner's share$150,000
1987ChampionChampion
1987Scott SimpsonScott Simpson
1987277 (-3)277 (-3)
1987‹ 1986
1988 ›
‹ 1986
1988 ›
1986Tournament informationDatesJune 12–15, 1986
1986Tournament informationLocationShinnecock Hills, New York
1986Tournament informationCourse(s)Shinnecock Hills Golf Club
1986Tournament informationOrganized byUSGA
1986Tournament informationTour(s)PGA Tour
1986StatisticsPar70
1986StatisticsLength6,912 yards (6,320 m)
1986StatisticsField155 players, 70 after cut
1986StatisticsCut150 (+10)
1986StatisticsPrize fund$700,000
1986StatisticsWinner's share$115,000
 
Upvote 0
and end result :)

1988DatesJune 16–20, 1988
1988LocationBrookline, Massachusetts
1988Course(s)The Country Club
Composite Course
1988Tour(s)PGA Tour
1988Par71
1988Length7,010 yards (6,410 m)
1988Field156 players, 65 after cut
1988Cut146 (+4)
1988Prize fund$1.0 miilion
1988Winner's share$180,000
1988Curtis StrangeCurtis Strange
1988278 (-6), playoff278 (-6), playoff
1988‹ 1987
1989 ›
‹ 1987
1989 ›
1987DatesJune 18–21, 1987
1987LocationSan Francisco, California
1987Course(s)Olympic Club, Lake Course
1987Organized byUSGA
1987Tour(s)PGA Tour
1987FormatStroke play
1987Par70
1987Length6,709 yards (6,135 m)
1987Field156 players, 77 after cut
1987Cut147 (+7)
1987Prize fund$825,000
1987Winner's share$150,000
1987Scott SimpsonScott Simpson
1987277 (-3)277 (-3)
1987‹ 1986
1988 ›
‹ 1986
1988 ›
1986DatesJune 12–15, 1986
1986LocationShinnecock Hills, New York
1986Course(s)Shinnecock Hills Golf Club
1986Organized byUSGA
1986Tour(s)PGA Tour
1986Par70
1986Length6,912 yards (6,320 m)
1986Field155 players, 70 after cut
1986Cut150 (+10)
1986Prize fund$700,000
1986Winner's share$115,000
1985DatesJune 13–16, 1985
1985LocationBloomfield Hills, Michigan
1985Course(s)Oakland Hills Country Club, South Course
1985Tour(s)PGA Tour
1985Par70
1985Length6,996 yards (6,397 m)
1985Field156 players, 66 after cut
1985Cut146 (+6)
1985Prize fund$650,000
1985Winner's share$103,000
 
Upvote 0
Sandy, thanks for your dedication here. I was able to create a column under the field Select containing all the dates from 1860 onward. When the Query was invoked, some of the webpages reported an error becsue they don't exist, notably for the war years like 1917 and 1943. I then altered the M-code to reflect each of the four tournaments and to report the Dates only. As you noticed for the US_Open, there were years for each tournament when the Query stopped working and needed to have its columns adjusted. After these tinkerings, the Query reported a long column of Dates for each tournament and I now have 448 dates for the four major tournaments.

Dealing with these dates is the new challenge. Here is a sample of my output. I will need to somehow create a start date and an end date for each tournament, which won't be easy because they are all laid out differently across and within each field.

openusopenpgamasters
17 October 1860October 4, 1895October 10–14, 1916March 22–25, 1934
26 September 1861July 18, 1896September 16–20, 1919April 4–8, 1935
11 September 1862September 17, 1897August 17–21, 1920April 3–6, 1936
18 September 1863June 17–18, 1898August 14–18, 1922April 1–4, 1937
16 September 1864September 14–15, 1899September 24–29, 1923April 2–4, 1938
14 September 1865October 4–5, 1900September 15–20, 1924March 31 – April 2, 1939
13 September 1866June 14–17, 1901September 21–26, 1925April 4–7, 1940
26 September 1867October 10–11, 1902September 20–25, 1926April 3–6, 1941
23 September 1868June 26–29, 1903October 31 –

November 5, 1927
April 9–13, 1942
16 September 1869July 8–9, 1904October 1–6, 1928April 4–7, 1946
15 September 1870September 21–22, 1905December 2–7, 1929April 3–6, 1947
13 September 1872June 28–29, 1906September 8–13, 1930April 8–11, 1948
4 October 1873June 20–21, 1907September 14–19, 1931April 7–10, 1949
10 April 1874August 27–29, 1908August 30 –

September 4, 1932
April 6–9, 1950
10 September 1875June 24–25, 1909August 8–13, 1933April 5–8, 1951
30 September, 2 October 1876June 17–20, 1910July 24–29, 1934April 3–6, 1952
6 April 1877June 23–26, 1911October 17–23, 1935April 9–12, 1953
4 October 1878August 1–2, 1912May 24–30, 1937April 8–12, 1954
27, 29 September 1879September 18–20, 1913July 10–16, 1938April 7–10, 1955
9 April 1880August 20–21, 1914July 9–15, 1939April 5–8, 1956
 
Last edited:
Upvote 0

Forum statistics

Threads
1,214,515
Messages
6,119,970
Members
448,933
Latest member
Bluedbw

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top