標題:
[發問]
想請教如何在網頁(非表格狀態)抓資料(特定字串)到EXCEL表單
[打印本頁]
作者:
kuhsuanchieh
時間:
2015-9-24 18:48
標題:
想請教如何在網頁(非表格狀態)抓資料(特定字串)到EXCEL表單
已經有很多大大交過怎麼把網頁資料抓到EXCEL表單,但是好像都是抓固定的表單或相對應的欄位,
目前遇到一個問題就是想要在
http://www.tyland.org.tw/pg.asp?theme=11&kinds=2&area=&search=o&review=&model=1&meid=&absolutepage=1
這個網址中,抓取會員的資料,但是會員的資料需要再點選會員姓名後再進入另一個頁面,
http://www.tyland.org.tw/view-m.asp?mno=1
而且我只想要抓取姓名、電話、傳真、地址(開業執照地址)這4個欄位,
可以參閱一下附件,不知道這樣是否有可能一次抓完所有會員的資料。
PS:會員資料可能持續增加或更新。
[attach]22087[/attach]
作者:
GBKEE
時間:
2015-9-25 08:32
回復
1#
kuhsuanchieh
試試看
Option Explicit
'頁數
Const 頁數網址 = "http://www.tyland.org.tw/pg.asp?theme=11&kinds=2&area=&search=o&review=&model=1&meid=&absolutepage="
'會號 ID
Const ID = "http://www.tyland.org.tw/view-m.asp?mno="
Dim Sh As Worksheet, Ie As Object
Sub Ex()
Ex現在會員名錄
Ex_所有會員資料
End Sub
Sub Ex現在會員名錄()
Dim i As Integer, xTable As Object, r As Integer
Set Ie = CreateObject("InternetExplorer.Application")
Set Sh = Sheets(1)
With Sh
.UsedRange.Clear
.[A1:E1] = Array("ID", "姓名", "電話", "傳真", "地址")
'.[A1:D1] = Array( "姓名", "電話", "傳真", "地址")
.Activate
End With
With CreateObject("InternetExplorer.Application")
' .Visible = True
For i = 1 To Max_Page
.Navigate 頁數網址 & i
Do While .Busy Or .readyState <> 4: DoEvents: Loop
Set xTable = .Document.all.tags("table")(0).Rows
Application.StatusBar = 頁數網址 & i & " 載入..."
For r = 1 To xTable.Length - 1
Ex_現在會員資料 ID & xTable(r).Cells(1).INNERTEXT
Next
Next
.Quit
End With
Ie.Quit
Set Ie = Nothing
End Sub
Sub Ex_現在會員資料(URL As String)
Dim ID As String, i As Integer, E As Variant, ii As Integer, t As Variant, AR()
ID = "http://www.tyland.org.tw/view-m.asp?mno="
AR = Array(0, 1, 2, 3, 6) 'AR = Array( 1, 2, 3, 6) 不要"ID"
With Ie
' .Visible = True
.Navigate URL
Do While .Busy Or .readyState <> 4: DoEvents: Loop
t = Split(.Document.BODY.INNERTEXT, vbLf) '網頁的文字,vbLf 切割為陣列
With Sh.Range("A" & Rows.Count).End(xlUp).Offset(1)
.Select '可不用
For ii = 0 To 4
.Cells(1, ii + 1) = Split(t(AR(ii)), ":")(1)
Next
End With
End With
End Sub
Function Max_Page() As Integer '傳回會員名錄的總頁數
Dim E As Object
With CreateObject("InternetExplorer.Application")
' .Visible = True
.Navigate "http://www.tyland.org.tw/pg.asp?theme=11"
Do While .Busy Or .readyState <> 4: DoEvents: Loop
For Each E In .Document.all.tags("A")
If InStr(E.INNERTEXT, "最後一頁") Then
Max_Page = Replace(E.href, 頁數網址, "") '網頁字串最後的數字
Exit For
End If
Next
.Quit '關閉網頁
End With
End Function
'****************************************************
Sub Ex_所有會員資料()
Dim i As Integer, E As Variant, ii As Integer, t As Variant, AR()
Dim Sh As Worksheet
Set Sh = Sheets(2)
With Sh
.UsedRange.Clear
.[A1:E1] = Array("ID", "姓名", "電話", "傳真", "地址")
'.[A1:D1] = Array( "姓名", "電話", "傳真", "地址")
.Activate
End With
AR = Array(0, 1, 2, 3, 6) 'AR = Array( 1, 2, 3, 6) '不要"ID"
With CreateObject("InternetExplorer.Application")
' .Visible = True
For i = 1 To Max_Id
.Navigate ID & i
Do While .Busy Or .readyState <> 4: DoEvents: Loop
Application.StatusBar = ID & i & " 載入..."
t = Split(.Document.BODY.INNERTEXT, vbLf)
If UBound(t) > -1 Then
With Sh.Range("A" & Rows.Count).End(xlUp).Offset(1)
.Select '可不用
For ii = 0 To 4 'For ii = 0 To 3 '不要"ID
.Cells(1, ii + 1) = Split(t(AR(ii)), ":")(1)
Next
End With
End If
Next
.Quit
End With
End Sub
Function Max_Id() As Integer '查找最新會員的會號
Dim E As Object
With CreateObject("InternetExplorer.Application")
' .Visible = True
.Navigate "http://www.tyland.org.tw/pg.asp?theme=11"
Do While .Busy Or .readyState <> 4: DoEvents: Loop
For Each E In .Document.all.tags("A")
If InStr(E.INNERTEXT, "最後一頁") Then
E.Click '按下 "最後一頁"
Exit For
End If
Next
Do While .Busy Or .readyState <> 4: DoEvents: Loop
Set E = .Document.all.tags("table")(0).Rows
Max_Id = E(E.Length - 1).Cells(1).INNERTEXT '最新會員的會號
.Quit '關閉網頁
End With
End Function
複製代碼
作者:
kuhsuanchieh
時間:
2015-9-25 13:15
回復
2#
GBKEE
感謝 GBKEE 版主的說明,但是我想請教中間有關「傳回會員名錄的總頁數」這個部分,
如果不要有這個動作,是否就無法全數抓回來?那我可以下
i=1 to 9999
這樣子嗎?
作者:
GBKEE
時間:
2015-9-25 15:29
本帖最後由 GBKEE 於 2015-9-25 15:32 編輯
回復
3#
kuhsuanchieh
For i = 1 To Max_Id
.Navigate ID & i
複製代碼
Sub Ex_所有會員資料().可以改成如此,
會員號碼最後是 1407,9999會多跑很久的
For i = 1 To 9999
.Navigate ID & i
複製代碼
作者:
准提部林
時間:
2015-9-25 15:45
超版對網頁非常專業,程式也慎求〔正確完整〕及〔效率〕,
取得〔頁數〕及〔最大ID〕才可做最小的迴圈,認真負責的工程師才會這樣不厭勞煩,佩服!!!
提供另個不專業參考:
Sub TEST()
Dim UL$, STR, FN&, j&, k%, Arr, X&, N&, TM
TM = Time
[Sheet1!A:E].ClearContents: [F1] = ""
[Sheet1!A1:E1] = Array("ID", "姓名", "電話", "傳真", "地址")
'↓取得〔總頁數〕
STR = 網頁原始碼("http://www.tyland.org.tw/pg.asp?theme=11")
STR = Split(Split(STR, ">最後一頁")(0), "absolutepage=")
FN = STR(UBound(STR))
'↓取得〔最大ID編號〕
UL = "http://www.tyland.org.tw/pg.asp?theme=11&kinds=2&area=&search=o&review=&model=1&meid=&absolutepage="
STR = 網頁原始碼(UL & FN)
STR = Split(STR, "view-m.asp?mno=")
FN = Split(STR(UBound(STR)), "'>")(0)
'↓開始擷取資料(含所有資料)
ReDim Arr(1 To FN, 1 To 5)
For j = 1 To FN
Application.StatusBar = "■■■■■■資料擷取中:" & j & "/" & FN
STR = 網頁原始碼("http://www.tyland.org.tw/view-m.asp?mno=" & j)
If Len(STR) = 0 Then GoTo 101
X = InStr(STR, "<li>會 號:")
If X = 0 Then GoTo 101
STR = Split(Mid(STR, X), "</li>")
N = N + 1
For k = 1 To 4 'ID.姓名.電話.傳真
Arr(N, k) = Trim(Split(STR(k - 1), ":")(1))
Next k
Arr(N, 5) = Trim(Split(STR(6), ":")(1)) '地址
101: Next j
If N > 0 Then [A2:E2].Resize(N) = Arr
[Sheet1!F1] = Format(Time - TM, "hh:mm:ss")
Application.StatusBar = False
Beep
End Sub
'=============副程式
Function 網頁原始碼(xURL$) As String
On Error Resume Next
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", xURL, False
.send
網頁原始碼 = .ResponseText
End With
On Error GoTo 0
End Function
複製代碼
載點1:[attach]22092[/attach]
載點2:
http://www.funp.net/840179
作者:
GBKEE
時間:
2015-9-26 10:34
本帖最後由 GBKEE 於 2015-9-26 10:35 編輯
回復
5#
准提部林
愛說笑了版主程式碼,快太多了.
CreateObject("MSXML2.XMLHTTP")本文傳送完,ie還在開啟等候中...
作者:
kuhsuanchieh
時間:
2015-9-30 12:47
回復
5#
准提部林
看完兩位的程式,小弟只能說五體投地的佩服,
再加上那個時間,我不禁笑出來了!
感謝兩位大大的指導,我看我又要好好研究一兩個禮拜了!
歡迎光臨 麻辣家族討論版版 (http://forum.twbts.com/)