WEB SCRAPING

Paulo Silveira Xpto

New Member
Joined
Aug 27, 2020
Messages
11
Office Version
  1. 2016
Platform
  1. Windows
Hello, sorry for my bad english. I'm from Brazil. I've bought the book "Excel 2016 VBA & MACROS" by Bill Jelen and Tracy Syrstad.
Here in Brazil by Ed ALTA BOOKS.

I need help to finish a Web Scraping code.
A basic file in Excel, with the first header row with 4 columns:
Número Antigo - Novo - Vara - Data de Autuação
Starting with row 2 and 1st column, there are a series of numbers to be searched on a specific page, namely.

The idea of the code is to launch the Old Number on the page and bring the New + the other 3 datas from there;
Here is a small sequence of numbers to be mined:
200334000111326
200334000111179
200334000107120
200334000107117
200334000105010

When calling the page in question, the code must:
- Click on the "Número de Processo" button, just below "Opções de pesquisa";
- Take the 1st in the spreadsheet and play it in the COMBO "Número do Processo", under CONSULTA PROCESSUAL;
- Click on the "Pesquisar" button;
- As soon as pg loads, bring from there the value of the field "Nova Numeração" + "Vara" + "Data de Autuação";

I still couldn't click on the 1st button, because before, when there was no loop, the code even brought the data, wrong, but it did.
Hence when the loop is triggered, errors arise.

Here is the code:

Sub TRF1_Click()

Dim IE As InternetExplorer
Set IE = New InternetExplorer
IE.Visible = True
Dim col As Integer
Dim ln As Integer
Dim antigo As String
Dim numprocesso As Variant
Dim novoproc As Variant
Dim vara As Variant
Dim datautua As Variant
Dim botao As Variant
Dim UltCel As Range

IE.Navigate "https://processual.trf1.jus.br/consultaProcessual/numeroProcesso.php?secao=DF"

While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Wend

ln = 2

col = 1

While Cells(ln, col).Value <> ""

antigo = Cells(ln, col).Value

Set numprocesso = IE.Document.all("Número do Processo")

While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Wend

numprocesso.Click

IE.Document.all("proc").innerText = antigo

While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Wend

Set botao = IE.Document.getElementById("enviar")

While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Wend

botao.Click

While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Wend

Set novoproc = IE.Document.all.tags("td")(1)
Set vara = IE.Document.all.tags("td")(3)
Set datautua = IE.Document.all.tags("td")(5)

Cells(ln, col + 1) = novoproc.innerText
Cells(ln, col + 2) = vara.innerText
Cells(ln, col + 3) = datautua.innerText

ln = ln + 1

novoproc = vbNullString
vara = vbNullString
datautua = vbNullString
antigo = vbNullString

IE.Quit

DoEvents

Wend

End Sub
 

Excel Facts

Repeat Last Command
Pressing F4 adds dollar signs when editing a formula. When not editing, F4 repeats last command.
Welcome to MrExcel forums.

If you inspect the 'Número do Processo' input element using the browser's developer tools, you'll see that blur, keypress and keyup events are assigned to it. Therefore to automate the page we also have to trigger these events to make it seem like a person is entering the number. It turns out that only the keypress event is required. Also, the form (id="form1") must be submitted, rather than clicking the 'Pesquisar' button.

This code requires references to Microsoft Internet Controls and Microsoft HTML Object Library, set via Tools -> References in the VBA editor.

VBA Code:
'References
'Microsoft Internet Controls
'Microsoft HTML Object Library

Option Explicit

#If VBA7 Then
    Private Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
    Private Declare PtrSafe Function SetForegroundWindow Lib "user32.dll" (ByVal hwnd As LongPtr) As LongPtr
#Else
    Private Declare Sub Sleep Lib "kernel32" (ByVal milliseconds As Long)
    Private Declare Function SetForegroundWindow Lib "user32.dll" (ByVal hwnd As Long) As Long
#End If


Public Sub IE_Search_and_Extract()

    Dim IE As InternetExplorer
    Dim URL As String
    Dim HTMLdoc As HTMLDocument
    #If VBA7 Then
        Dim keypressEvent As DOMKeyboardEvent
    #Else
        Dim keypressEvent As Object
    #End If
    Dim numeroInput As HTMLInputElement
    Dim form As HTMLFormElement
    Dim procDiv As HTMLDivElement
    Dim table As HTMLTable
    Dim numeroRange As Range, numeroCell As Range
       
    With ActiveSheet
        Set numeroRange = .Range("A2", .Cells(.Rows.count, "A").End(xlUp))
    End With
    
    URL = "https://processual.trf1.jus.br/consultaProcessual/numeroProcesso.php?secao=DF"
    
    Set IE = Get_IE_Window("processual.trf1.jus.br")
    If IE Is Nothing Then Set IE = New InternetExplorer
    IE.Visible = True
    SetForegroundWindow IE.hwnd
    
    For Each numeroCell In numeroRange
           
        numeroCell.Offset(, 1).Resize(, 3).Clear
        
        With IE
            .navigate URL
            While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
            Set HTMLdoc = .document
        End With
                
        If keypressEvent Is Nothing Then
            Set keypressEvent = HTMLdoc.createEvent("KeyboardEvent")
            keypressEvent.initEvent "keypress", True, False
        End If
        
        '<input type="text" name="proc" size="25" maxlength="30" id="proc" value="" class="focus {validate:{required:true,minlength:10,messages:{required:'Campo obrigatório.', minlength:'Informe pelo menos {0} dígitos.'}}} valid">
        'Events: blur, keypress, keyup
        
        Set numeroInput = HTMLdoc.getElementById("proc")
        numeroInput.Value = numeroCell.Value
        numeroInput.dispatchEvent keypressEvent
        DoEvents
        Sleep 20
        
        '<form name="form1" class="form-column" id="form1" method="post" action="/consultaProcessual/processo.php" novalidate="novalidate">
        
        Set form = HTMLdoc.getElementById("form1")
        form.submit
        While IE.Busy Or IE.readyState <> READYSTATE_COMPLETE: DoEvents: Wend
        
        '<div class="ui-tabs-panel ui-widget-content ui-corner-bottom" id="aba-processo">

        Do
            Set procDiv = HTMLdoc.getElementById("aba-processo")
            DoEvents
        Loop While procDiv Is Nothing
        Set table = procDiv.getElementsByTagName("TABLE")(0)
        
        numeroCell.Offset(, 1).Value = table.Rows(1).Cells(1).innerText
        numeroCell.Offset(, 2).Value = table.Rows(3).Cells(1).innerText
        numeroCell.Offset(, 3).Value = table.Rows(5).Cells(1).innerText
        
    Next
    
End Sub


Private Function Get_IE_Window(URLorName As String) As SHDocVw.InternetExplorer

    'Look for an IE browser window or tab already open at the (partial) URL or location name and, if found, return
    'that browser as an InternetExplorer object.  Otherwise return Nothing

    Dim Shell As Object
    Dim IE As SHDocVw.InternetExplorer
    Dim i As Variant 'Must be a Variant to index Shell.Windows.Item() array
    
    Set Shell = CreateObject("Shell.Application")
    
    i = 0
    Set Get_IE_Window = Nothing
    While i < Shell.Windows.count And Get_IE_Window Is Nothing
        Set IE = Shell.Windows.Item(i)
        If Not IE Is Nothing Then
            'Debug.Print IE.LocationURL, IE.LocationName, IE.Name
            If IE.Name = "Internet Explorer" And InStr(IE.LocationURL, "file://") <> 1 Then
                If InStr(1, IE.LocationURL, URLorName, vbTextCompare) > 0 Or InStr(1, IE.LocationName, URLorName, vbTextCompare) > 0 Then
                    Set Get_IE_Window = IE
                End If
            End If
        End If
        i = i + 1
    Wend
    
End Function
 
Upvote 0
Thank you very much, the code goes far beyond what I expected or imagined. I will deepen my studies in VBA. You have opened my mind to countless possibilities. No words to thank !!!
 
Upvote 0

Forum statistics

Threads
1,214,585
Messages
6,120,394
Members
448,957
Latest member
Hat4Life

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top