Importing data from a webpage. I need an expert!

Jimatdks

New Member
Joined
Feb 15, 2010
Messages
49
Hello everyone!

I am trying to import data from a webpage. I have written some code that works, but because of the amount of data I am trying to upload i really need to streamline the code. I am looking for any suggestions.

The website i am trying to upload information from:

http://www4.ncdc.noaa.gov/cgi-win/wwcgi.dll?wwevent~ShowEvent~842907

You can see that the end of the URL is a number. This number corresponds to an event number....there are about 850,000 of them! And I need to import all of them...

I know that one excel workbook can not hold all of the lines, so the code will have to upload as many as possible and then open a new workbook and continue.

If you go through a few of the events (just increase or decrease the event number, i.e. 842906 instead of 842907 in the URL above.) you will see that there are different event types. I am only interested in Hail and Wind events. There are several different types of wind events...thunderstorm wind, high wind, etc. (You can go to this website and look at some of the events: http://www4.ncdc.noaa.gov/cgi-win/wwcgi.dll?wwEvent~storms You will have to pick a state. You will then see a new page where you enter some additional data and see the different event types). I thought that the code could import maybe 1000 events, remove all of the non-hail/wind events, and then continue. This would reduce the number of lines that would have to be in each workbook.

I have inserted the code I have so far below. You will see that the only fields that I need from the webpage are: event type, event date, lat/long, magnitude, state, and county. I wouldn't mind getting everything and then removing unnecessary fields later.

The code uses the "Data>From Web" type of import (i am sure there is a name for this...i just dont know what it is.). Once the data is in the workbook, I copy the fields of interest into new cells. I then move on to the next event number, and repeat, adding a new row of data each time.

Please let me know if you can help! Thanks in advance.

Code:

Sub Macro1()

Dim eventtype
Dim eventdate
Dim latlong
Dim mag
Dim state
Dim county

eventnum = 1
rownum = 14

Do

Range("A1").Select

With ActiveSheet.QueryTables.Add(Connection:= _
"URL;http://www4.ncdc.noaa.gov/cgi-win/wwcgi.dll?wwevent~ShowEvent~" & eventnum, _
Destination:=Range("$A$1"))
.Name = "wwcgi.dll?wwevent~ShowEvent~866281_1"
.FieldNames = True
.RowNumbers = Flase
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlOverwriteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = "5"
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With

eventtype = Range("B1").Value
eventdate = Range("B2").Value
latlong = Range("B4").Value
mag = Range("B7").Value
state = Range("D1").Value
county = Range("D3").Value


Cells(rownum, 1).Value = eventnum
Cells(rownum, 2).Value = eventtype
Cells(rownum, 3).Value = eventdate
Cells(rownum, 4).Value = latlong
Cells(rownum, 5).Value = mag
Cells(rownum, 6).Value = state
Cells(rownum, 7).Value = county

rownum = rownum + 1
eventnum = eventnum + 1

Loop While eventnum < 5 'this will have to be something like 850,000 when code is complete

End Sub
 

Excel Facts

Can you AutoAverage in Excel?
There is a drop-down next to the AutoSum symbol. Open the drop-down to choose AVERAGE, COUNT, MAX, or MIN
I've written some code to grab the data you need without a web query on a sheet.
One problem I've come across is the different way some events are reported: some have Begin LatLon (event 2) some don't (event 842898). Event 2 has a County, event 842898 has a Zone. Now this could easily be solved it all your required events are laid out in the same way, and you have stipulated that you're interested in Hail and Wind events, but when I look at http://www4.ncdc.noaa.gov/cgi-win/wwcgi.dll?wwevent~storms the Event type dropdown contains:
"Funnel Cloud", "Hail", "High Winds", "Hurricane & Tropical Storm", "Ocean & Lake Surf", "Strong Winds", "Thunderstorm Winds", "Tornado", "Water Spout".
Worse, when I run my code on events 1 to 100 and 842807 to 842907 I see that the types come out as:
Tornado
Tstm Wind
Hail
High Wind
Thunderstorm Wind
Marine Thunderstorm Wind
Heavy Snow
Winter Storm
Strong Wind
Flood
Extreme Cold/wind Chill
Ice Storm
Avalanche
Flash Flood
Wildfire
Heavy Rain
Blizzard
Winter Weather
Cold/wind Chill

These don't correspond very well to the type dropdown, and that being only 200 out of some 850k records is not likely to be inclusive!

Can you help to narrow down possibilities, preferably with a list of event types you want to see, as they appear in the reports in the 'Event:' field, that is the first field in each report/web page?
Then if they're all laid out in the same way, I'll have got the problem licked.
ps. What version of Excel?
 
Upvote 0
p45cal, thanks for the reply!

It sounds like you have started to understand the frustration I have had with this database since I started working with it....

Some background:
I use the NCDC Storm Events Database to query weather events that have occurred in certain areas. Up until recently, I was able to buy the whole database (MS Access) from NCDC and upload to our servers. From there, I have a program that will sort through all of the data and select events based on criteria entered by the user. The queried data is sent to excel. My excel program uses the long/lat data for each of the events and finds the distance from a given address or long/lat. From there, excel formats a table and displays the end result for the user.

Recently, NCDC has either stopped (or is WAY behind on) producing the DVD versions of the database. So, I have decided to just try and pull everything from their website. And that is why we are talking today...

As far as the inconsistencies in the "event type" within the events, I have had this issue already. The query that was done previously used %wind% when searching through the database to pull all "event types" that had the word "wind" within the field...not sure if this is possible in what we are trying to do , but I think it is the best way to ensure that all "wind" events are imported.

Although Tornadoes are technically considered wind events, I dont need them for this specific use. From the drop down box on the NCDC web page, I am only interested in the following:
Hail
High Winds
Strong Winds
Thunderstorm Winds

I ran a query on our old data set and found the following event types:
High Wind
High Winds
Thunderstorm Winds
Thunderstorm Wind
Thunderstorm Windss
TSTM Wind
TSTM Winds
Strong Wind
Strong Winds

I am not sure if this is inclusive and it appears that some of the entries have typos (windss).

For the beginning/end information, I am only interested in the beginning. If it doesn't have a value, I would just want it to return a 0, or similar.

It looks like the event specific webpage source code has a spot for each of the fields of interest. How does your code work? Is there a way to reference, for example, "Begin Location:" in the source code and have excel pull the value that comes after? Hopefully we can troubleshoot this issue together!

As far as excel, I am using 2007. Sorry i didnt include that in my last post....i thought it was in my signature. Guess not!

Let me know if you have any other questions. I will try and help as much as possible. And again, thanks for helping me out with this!
 
Upvote 0
I've managed to do a bit of coding, have to break off for a while, so some code to get your teeth into in the meantime:
This code adapted from http://scriptorium.serve-it.nl/view.php?sid=40) will add a sheet (if it doesn't already exist) called EventTypes. It will go through the events numbers in:
For i = 897201 To 897401 'adjust (on 27 Jan 2012 max. was 897401)
and only add an event number in column A and and corresponding event type in column B if the event type is not already to be found in column B. Subsequent reruns will not overwrite existing data, only add it if not already there in column B.
So this creates a unique list of event types for you to explore as well as their event number.

However, it goes at about the rate of 2 events per second, so if you want to do the lot it's going to takes 5 days+ !

It shows the event number that it's processing in the Status bar at the bottom of the excel sheet window, because largely, there'll be little other activity to see.

However, you'll be able to do the whole database in batches.

Later, I'll add all the fields and not exclude any events. Because the time taken to get several fields will not be significantly longer than getting just one (most of the time is spent fetching the page), you'll be able to let it run to get your full database, then periodically you'll just run the code again with missing event numbers in the For.. ..Next loop to bring it up to date (seconds or minutes to do).
Then you'll be able filter it/query it etc. locally.
Since you've got Excel 2007 you'll be able to fit all the data onto one sheet easily.

I do feel there might be a faster way and will look for this after I've done the above.

The code:
Code:
Sub GetWeatherEventTypes()
Dim i As Long
Dim sURL As String, sHTML As String
Dim oHttp As Object
Dim lTopicstart As Long, lTopicend As Long
Dim blWSExists As Boolean
'Create a new Worksheet "EventTypes" if it doesnt'exist already.
For i = 1 To Worksheets.Count
  If Worksheets(i).Name = "EventTypes" Then
    blWSExists = True
    Worksheets(i).Activate
    Set DestWS = Worksheets(i)
  End If
Next i
If Not blWSExists Then
  Set DestWS = Worksheets.Add(after:=Worksheets(Worksheets.Count))
  DestWS.Name = "EventTypes"
End If
BaseUrl = "http://www4.ncdc.noaa.gov/cgi-win/wwcgi.dll?wwevent~ShowEvent~"

For i = [COLOR=Blue]897201 To 897401[/COLOR]  'adjust (on 27 Jan 2012 max. was 897401)
  sURL = BaseUrl & i
  ' Create an XMLHTTP object and add some error trapping
  On Error Resume Next
  Set oHttp = CreateObject("MSXML2.XMLHTTP")
  If Err.Number <> 0 Then
    Set oHttp = CreateObject("MSXML.XMLHTTPRequest")
    MsgBox "Error 0 has occured while creating a MSXML.XMLHTTPRequest object"
  End If
  On Error GoTo 0
  If oHttp Is Nothing Then
    MsgBox "For some reason I wasn't able to make a MSXML2.XMLHTTP object"
    Exit Sub
  End If
  'Open the URL in browser object
  oHttp.Open "GET", sURL, False
  oHttp.Send
  sHTML = oHttp.responseText
  'Event Type:
  myType = "reset"
  lTopicstart = InStr(1, sHTML, "Event:", vbTextCompare)
  If lTopicstart > 0 Then
    lTopicstart = InStr(lTopicstart + 6, sHTML, "", vbTextCompare) + 3
    lTopicend = InStr(lTopicstart, sHTML, "", vbTextCompare)
    myType = Mid(sHTML, lTopicstart, lTopicend - lTopicstart)
  Else
    myType = "not found on page"
  End If
  If DestWS.Columns(2).Find(what:=myType, LookIn:=xlFormulas, lookat:=xlWhole) Is Nothing Then
    Set lr = DestWS.Cells(Rows.Count, "A").End(xlUp).Offset(1)
    'lr.Select
    lr.Value = i
    lr.Offset(, 1).Value = myType
  End If
  Application.StatusBar = i
  sHTML = ""
Next i
Set oHttp = Nothing
Application.StatusBar = False
End Sub
 
Last edited:
Upvote 0

Forum statistics

Threads
1,216,561
Messages
6,131,403
Members
449,650
Latest member
Adamd325

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top