- 帖子
- 55
- 主題
- 1
- 精華
- 0
- 積分
- 81
- 點名
- 0
- 作業系統
- win
- 軟體版本
- 10
- 閱讀權限
- 20
- 註冊時間
- 2016-5-15
- 最後登錄
- 2018-11-1
|
150#
發表於 2016-9-18 18:38
| 只看該作者
本帖最後由 koshi0413 於 2016-9-18 18:40 編輯
回復 149# c_c_lai
差別在於 User-Agent 內容 會隨機變換,避免用同一資訊大量提取網頁,下面有win10, iphone, ipad,Linux
其它的要在網上收集,反正就是看到就丟進去,隨機的越多越好,在短時間大量取網頁,送出至對方伺服器的資料才有所差別
不過這只是第一步,其實重要的還是在於 ip 隨機變換
應用例:- import requests
- import random
- hs = ['Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36',
- 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7A341 Safari/528.16',
- 'Mozilla/5.0 (Linux; U; Android 4.1.2; zh-tw; GT-I9300 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30',
- 'Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10'
- 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)']
- headers = {"User-Agent":random.choice(hs)}
- url = '第一個網頁'
- payload={'download':'csv',
- 'qdate':'105/09/07',
- 'selectType':'ALL'}
- res = requests.post(url, headers=headers, data=payload, stream=True)
- headers2 = {"User-Agent":random.choice(hs)}
- url2 = '第二個網頁'
- payload2={'download':'csv',
- 'qdate':'105/09/07',
- 'selectType':'ALL'}
- res2 = requests.post(url2, headers=headers2, data=payload2, stream=True)
複製代碼 單純的隨機變換例:- import random
- for i in xrange(10):
- hs = ['Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36',
- 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7A341 Safari/528.16',
- 'Mozilla/5.0 (Linux; U; Android 4.1.2; zh-tw; GT-I9300 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30',
- 'Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10'
- 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)']
- headers = {"User-Agent":random.choice(hs)}
- print headers
複製代碼 |
|