Board logo

標題: [發問] How to scrape this website dynamic data from Javascript with VBA? [打印本頁]

作者: DanielWONG    時間: 2022-11-19 16:43     標題: How to scrape this website dynamic data from Javascript with VBA?

Website: https://www.hangseng.com/en-hk/personal/insurance-mpf/e-mpf/fund-price-performance/

Targets of automate data scraping: fund price as highlighted in the below figure

[attach]35494[/attach]

Tools: VBA(Excel), Chrome driven by Selenium

The code started as below:
  1. Chrome.ExecuteScript "window.open('https://www.hangseng.com/en-hk/e-services/e-mpf/fund-price- performance/price/','_blank');"

  2. Chrome.SwitchToNextWindow

  3. Chrome.FindElementByXPath("//a[text()='Acknowledge']").Click
複製代碼
Then I have no further idea to retrieve those specified data. Please provide help. Thanks a lot!
作者: singo1232001    時間: 2022-11-19 21:13

本帖最後由 singo1232001 於 2022-11-19 21:17 編輯

回復 1# DanielWONG


Sub test2()
Set chrome = CreateObject("selenium.chromedriver")
chrome.get "https://www.hangseng.com/en-hk/personal/insurance-mpf/e-mpf/fund-price-performance/"
chrome.Wait 1000

Set A_s = chrome.FindElementsBytag("a")
    For i = 1100 To A_s.Count
     If A_s(i).Text = "Acknowledge" Then A_s(i).Click: Exit For
    Next


ReDim ar(1 To 1000, 1 To 1)
Set td_s = chrome.FindElementsBytag("td")
    For Each Z In td_s
        If Z.attribute("headers") = "price_header1" Then w = w + 1: ar(w, 1) = Z.Text
    Next
   
Cells.ClearContents
Cells(1, 1).Resize(w, 1) = ar
End Sub
作者: DanielWONG    時間: 2022-11-20 10:46

回復 2# singo1232001


  The code is very simple , but very productive!
There are also two questions:
1)
  1. Set A_s = chrome.FindElementsBytag("a")
  2.     For i = 1100 To A_s.Count
複製代碼
Only 139 "<a" tag can be searched in the website source code, what is the meaning of 1100 here?

2)  for another website http://www.aastocks.com/tc/stocks/quote/detailchart.aspx?symbol=110000
It is possible to extract the highlighted RSI data ?
[attach]35495[/attach]

Thank you very much!
作者: singo1232001    時間: 2022-11-20 16:29

本帖最後由 singo1232001 於 2022-11-20 16:34 編輯

回復 3# DanielWONG


    tag <a>的元素 印象有2000個以上

而我們要的大約在第1275個  

從1開始跑 要跑很久 才到1275 大概10~20秒(畢竟是每個物件打開檢查會久)

本來打算直接用 A_s(1275).click 最快

但又怕網站 可能1~1274中  某些元素有時候消失 或者增加

造成1275的位置會變化

所以抓個容許值 從1100繼續找 前面的1~1099 都直接忽略  除非網頁變化改很大 少一堆元素

這樣不會太慢 也有容錯空間
作者: singo1232001    時間: 2022-11-20 21:45

回復 3# DanielWONG

關於第二個圖片辨識的問題
網路上有查到的資料

有兩種做法
1.直接改用python 調用圖形辨識的功能 (較簡單)
甚至yt有影片直接教你
圖型辨識 跟 爬蟲都有   爬蟲一樣用selenium


2.用vba 爬蟲抓圖>在上傳到百度圖片辨識的api 找到你要的那段
但能否使用 尚未實際做過  要請高手

3.vba(shell)+python
也大致如上 把1跟2合起來玩出新花樣 較複雜

看有沒有高手大大會
或請高手幫忙
或找淘x
作者: singo1232001    時間: 2022-11-21 17:17

回復 3# DanielWONG


    很陽春的玩法 會有一定機率失效

  全看OCR 網站的效果
作者: DanielWONG    時間: 2022-11-27 12:11

Thanks a lot! I found if  I  switch to this OCR site https://ocr.space/
The code should works well.
Respect for your nice work!

回復 6# singo1232001




歡迎光臨 麻辣家族討論版版 (http://forum.twbts.com/)