麻辣家族討論版版 - Powered by Discuz! Board

標題: [發問] How to scrape this website dynamic data from Javascript with VBA? [打印本頁]

作者: DanielWONG 時間: 2022-11-19 16:43 標題: How to scrape this website dynamic data from Javascript with VBA?

Website: https://www.hangseng.com/en-hk/personal/insurance-mpf/e-mpf/fund-price-performance/

Targets of automate data scraping: fund price as highlighted in the below figure

[attach]35494[/attach]

Tools: VBA(Excel), Chrome driven by Selenium

The code started as below:

Chrome.ExecuteScript "window.open('https://www.hangseng.com/en-hk/e-services/e-mpf/fund-price- performance/price/','_blank');"
Chrome.SwitchToNextWindow
Chrome.FindElementByXPath("//a[text()='Acknowledge']").Click

複製代碼

Then I have no further idea to retrieve those specified data. Please provide help. Thanks a lot!

作者: singo1232001 時間: 2022-11-19 21:13

本帖最後由 singo1232001 於 2022-11-19 21:17 編輯

回復 1# DanielWONG

Sub test2()
Set chrome = CreateObject("selenium.chromedriver")
chrome.get "https://www.hangseng.com/en-hk/personal/insurance-mpf/e-mpf/fund-price-performance/"
chrome.Wait 1000

Set A_s = chrome.FindElementsBytag("a")
For i = 1100 To A_s.Count
If A_s(i).Text = "Acknowledge" Then A_s(i).Click: Exit For
Next

ReDim ar(1 To 1000, 1 To 1)
Set td_s = chrome.FindElementsBytag("td")
For Each Z In td_s
If Z.attribute("headers") = "price_header1" Then w = w + 1: ar(w, 1) = Z.Text
Next

Cells.ClearContents
Cells(1, 1).Resize(w, 1) = ar
End Sub

作者: DanielWONG 時間: 2022-11-20 10:46

回復 2# singo1232001

The code is very simple , but very productive!
There are also two questions:
1)

Set A_s = chrome.FindElementsBytag("a")
For i = 1100 To A_s.Count

複製代碼

Only 139 "<a" tag can be searched in the website source code, what is the meaning of 1100 here?

2) for another website http://www.aastocks.com/tc/stocks/quote/detailchart.aspx?symbol=110000
It is possible to extract the highlighted RSI data ?
[attach]35495[/attach]

Thank you very much!

作者: singo1232001 時間: 2022-11-20 16:29

本帖最後由 singo1232001 於 2022-11-20 16:34 編輯

回復 3# DanielWONG

tag <a>的元素印象有2000個以上

而我們要的大約在第1275個

從1開始跑要跑很久才到1275 大概10~20秒(畢竟是每個物件打開檢查會久)

本來打算直接用 A_s(1275).click 最快

但又怕網站可能1~1274中  某些元素有時候消失或者增加

造成1275的位置會變化

所以抓個容許值從1100繼續找前面的1~1099 都直接忽略  除非網頁變化改很大少一堆元素

這樣不會太慢也有容錯空間

作者: singo1232001 時間: 2022-11-20 21:45

回復 3# DanielWONG

關於第二個圖片辨識的問題
網路上有查到的資料

有兩種做法
1.直接改用python 調用圖形辨識的功能 (較簡單)
甚至yt有影片直接教你
圖型辨識跟爬蟲都有爬蟲一樣用selenium

2.用vba 爬蟲抓圖>在上傳到百度圖片辨識的api 找到你要的那段
但能否使用尚未實際做過要請高手

3.vba(shell)+python
也大致如上把1跟2合起來玩出新花樣較複雜

看有沒有高手大大會
或請高手幫忙
或找淘x

作者: singo1232001 時間: 2022-11-21 17:17

回復 3# DanielWONG

很陽春的玩法會有一定機率失效

全看OCR 網站的效果

作者: DanielWONG 時間: 2022-11-27 12:11

Thanks a lot! I found if I switch to this OCR site https://ocr.space/
The code should works well.
Respect for your nice work!

回復 6# singo1232001

歡迎光臨麻辣家族討論版版 (http://forum.twbts.com/)