ªð¦^¦Cªí ¤W¤@¥DÃD µo©«

[­ì³Ð] python¤W¥«Âd¤T¤jªk¤H¶R½æ¶W¤é³ø¸ê®Æ¤U¸ü

«e´X¤Ñc¤j¦^ÂЪº®É­Ô¦³¶K¤Fvbaªºcode¡AÅý§Ú·Q¨ì¦pªG§ä¤@¨Ç¦bexcelª©ªº°ÝÃD¥Îpython¸Õ°µ¬Ý¬Ý¡A¤]³\¦³¿³½ìªºªB¤Í¥i¥H¬Û¤¬°Ñ·Ó¡C
§ì¸ê®Æ«Ü¦h°ÝÃD·Q°_¨Ó¤£§xÃø¡A¦ý¹³§Ú³£¥u°µÂ²³æªºÀ³¥Î¡A¸gÅ禳­­¡A¯u­n°µ¤ñ¸û§¹¾ãªº¸ê®Æª¦¨ú¡A¼²À𪺲Ӹ`°ÝÃD·Q¥²«D±`¦h¡C
³Ìªñ¤ñ¸û¦³®É¶¡¡A­è¦n¶X³o­Ó¾÷·|½m²ß¤@¤U¡CÅwªïª©¤WªºªB¤Í¤@°_¬ã¨s°Q½×¡A¦pªG¦³¨ä¥L¨Ï¥Îpythonªº«e½ú¡A¤]½Ð¤£§[µ¹§Ú¤@¨Ç´£ÂI¸ò«Øij¡C
¤@¶}©l§Ú·Q´N±q³o½g¶}©lhttp://forum.twbts.com/thread-18259-1-2.html¡C
GBKEEª©¤j¤w¸g¥Îvba´£¨Ñ¤F«Ü¦nªº¸Ñ¨M¤è®×¡A¦ý­Y¬O·Q´«­Ó¤f¨ýÁÙ¬O¥i¥H¥Îpython¸Õ¸Õ¬Ý§â¸ê®Æª¦¨ú¤U¨Ó¡C

ºô§}ºô§}¡Ghttp://church.oursweb.net/slocation.php?w=1&c=TW&a=&t=&p=
Screenshot_1.png
2016-9-10 21:08

ºô­¶¬Ý°_¨Ó¨S¦³¤°»ò©_©_©Ç©Çªºjavascript¡A¦pªG¥u¬O­n§ì¹Ï¤ù¤¤ªºÂ²©ö¸ê®Æ¡A¨º¥u­n¡G
  1. #·Ç³Æ¤u¨ã
  2. import requests
  3. from bs4 import BeautifulSoup

  4. #«Ø¥ß¤@­Ósession
  5. s = requests.session()

  6. for i in range(11,14):
  7.     #ºô§}Åܰʪº¥u¦³³Ì«á¤@­Ó¼Æ¦r¡A¦@286­¶¡A¥Îrange(1,287)´N¥i¥H§â©Ò¦³¼Æ¦r¶]¹L¤@¦¸
  8.     #´ú¸Õ®É½Ð¤£­n¶]¤Ó¦h­¶¡A¥Î(1,3)¡B(11,14)¡B(280,283)¤§Ãþªº´N¦n
  9.     url = 'http://church.oursweb.net/slocation.php?w=1&c=TW&a=&t=&p=' + str(i)
  10.     res = s.get(url)
  11.     #­Y¦¬¨ìªºres¤å¦r¬°¶Ã½X´N¥[­Óencoding
  12.     res.encoding = 'utf-8'
  13.     soup = BeautifulSoup(res.text, 'lxml')
  14.         
  15.     #¨ú±o¸ê®Æ¡A§Ú­Ì­nªº¸ê®Æ³£¦s¦b³o¨âºØtr¸Ì   
  16.     data = soup.find_all('tr',['tb__a', 'tb__b'])

  17.     #±N¸ê®Æ¦L¥X
  18.     for d in data:
  19.         print(d.text)
  20.         
  21.     i += 1
½Æ»s¥N½X
¨º¦pªG­n¦¬¶°¸Ô²Ó¸ê®Æ©O¡H¨º´N­n¨ú±o¨C­¶¸Ìªº¦U­Ó±Ð·|ªº³sµ²¡A¤@­Ó¤@­Ó¶i¥hª¦¨ú¡C

TOP

¥»©«³Ì«á¥Ñ zyzzyva ©ó 2016-9-11 08:06 ½s¿è

¦^´_ 62# c_c_lai
¥Û¨I¤j®ü...¦]¬°¯uªº¨S¦³¸ê®ÆXD
¤£ª¾¹D¬°Ô£µ{¦¡½X¶K¤Wªº®É­Ô½×¾Â¨t²Î·|­×§ï¡A¤U­±³o¬q
data = soup.find_all('tr',['tb__a', 'tb__b'])¡AÀ³¸Ó¬O
Screenshot_2.png
2016-9-11 08:05

­×§ï¤@¤UÀ³¸Ó´N¥¿±`¤F¡C

TOP

¥»©«³Ì«á¥Ñ zyzzyva ©ó 2016-9-11 08:45 ½s¿è

¨ì³oÃäÂ÷GBKEEª©¤jª¦¨ìªº¸Ô²Ó¸ê®ÆÁÙ¦³¬Û·í¶ZÂ÷¡C
§Ú­Ìªº¤U¤@¨BÀ³¸Ó¬O­n¨ú±o­¶­±¤¤¦U±Ð·|ªº³sµ²¡G
  1. import requests
  2. from bs4 import BeautifulSoup

  3. s = requests.session()

  4. for i in range(11,13):
  5.     url = 'http://church.oursweb.net/slocation.php?w=1&c=TW&a=&t=&p=' + str(i)
  6.     res = s.get(url)
  7.     res.encoding = 'utf-8'
  8.     soup = BeautifulSoup(res.text, 'lxml')

  9. #--------¤U­±§ï¦¨³o¼Ë¥H¨ú¥X­¶­±¤¤¦U±Ð·|ªº³sµ²
  10.     for d in soup.select('a[href^="church.php?pkey"]'):
  11.             myUrl = 'http://church.oursweb.net/' + d.get('href')
  12.             print(myUrl)    #¥ý§â³sµ²¦L¥X¨Ó´ú¸Õ¤@¤U
  13.             #get_detail(myUrl,s) ¤§«á¥Î­Ófunction³B²z­¶­±¤¤ªº¸ê®Æ
  14.     i += 1
½Æ»s¥N½X
¦A¨Ó´N¬O­n§Ë­Ófunction¡A¬Ý¤F¤@¤Uºô­¶¡A³o¸ÌÀ³¸Ó¬O¤ñ¸û³Â·Ðªº¦a¤è¡C

TOP

¦^´_ 64# c_c_lai
³o­Ó§Ú·|°µ¤ñ¸û§¹¾ãªº¡A¥Ø¼Ð¬O¹³GBKEEª©¤j§ì¤U¨Óªº¦U±Ð·|ªº¸Ô²Ó¸ê®Æ¡C

TOP

¥»©«³Ì«á¥Ñ zyzzyva ©ó 2016-9-11 10:41 ½s¿è

¦]¬°³Ì«á§Æ±æ¯à¿é¥X¨ìcsv(·Ç³Æ¨Ï¥Îcsv moduleªºdictwriter¡A¥Hkey:valueªº¤è¦¡¼g¤J)¡A©Ò¥H­n¥ý§â¸ê®Æ³B²z¦n¡C
¸ÕµÛª½±µ§â¸ê®Æ¦L¥X¨Ó
Screenshot_1.png
2016-9-11 10:02

¦n¹³ÁÙ¥i¥H¡C¨º¸ÕµÛ§â¸ê®Æ©ñ¨ìlist¸Ì¬Ý¬Ý
Screenshot_2.png
2016-9-11 10:03

ÁÙ¯u¬OÁà¡AÀ³¸Ó¬O¦]¬°¥þ³¡ªº¸ê®Æ³£¦b¦P¤@­Ótd¸Ì¡A¤¤¶¡¤S¦³ªÅ¥Õ¡B´«¦æ¡A¹ê¦b¤£ª¾¹D±q¦ó³B²z°_¡C
Screenshot_3.png
2016-9-11 10:13

´«­Ó¤è¦¡¥ý§â¦r¦ê¨ú¥X¨Ó¡A¶¶«K§âªÅ¥Õ¥h±¼¡A¬Ý°_¨Ó¦n¦h¤F¡A¬O§Ú¯à°÷³B²zªº¸ê®Æ¤F¡C
(³o¼Ë­n¦h¤@­Ólist¡A·Pı¦³ÂI¶¸ô¡AÀ³¸Ó¦³§ó¦nªº³B²z¤è¦¡¡A¤£¹L¼È®É¨S·Q¥X¨Ó¡A¥ý¯àwork¦A»¡)
¦]¬°³Ì«á§Æ±æ±o¨ìªº¬O[ {dic1},{dic2},{'«ØÀÉ ID'¡G'811223', '¤ÀÃþ'¡G'±Ð¨|°V½m' ,...}, {dic4}....]ªº®æ¦¡¡A©Ò¥HÁÙ¦³¤@¨Ç°ÝÃD¡G
1¡B'¦^³ø¸ê®Æ¿ù»~ ' ¤£¬O§Ú­Ì­nªº¡A
2¡B'¹q¶l'¡B'ºô§}'µ¥¦³¤¤Â_¡A¹ê»Ú¸ê®Æ¶]¨ìlistªº¤U¤@­Ó¤¸¯À¥h¤F¡C(¨ä¹êÁÙ¦³©v¬£¡B¥À·|¡A¨C­Ó±Ð·|ªº¸ê®Æ¦³¨Ç³\¤£¦P)
¥ý·Ç³Æ´X­Ó¸ê®Æ¡G¤@­ÓªÅªº¦r¨å(dictionary)¡B2­ÓªÅªº¦ê¦C(list)¡A³o­Ó­n¦bfunction¥~­±¡C
myDict = {}
myList = []
tmpList = []
  1. def get_detail(url, s):
  2.     print(url)
  3.     res = s.get(url)
  4.     res.encoding = 'utf-8'
  5.     soup = BeautifulSoup(res.text, 'lxml')
  6.     detail = soup.find_all('td', 'church_detail')
  7.     for ddd in detail[0].stripped_strings:
  8.         if '¦^³ø¸ê®Æ¿ù»~ >' in ddd:
  9.         #¦pªG¦r¦ê¬O '¦^³ø¸ê®Æ¿ù»~ >'¡A¤£°Ê§@¡Aª½±µ¶i¤J¤U¤@¦¸°j°é
  10.             continue         
  11.         else:
  12.             tmpList.append(ddd)  #¨ä¥Lªº³£©ñ¶itmpList¸Ì

  13.     for i, s in enumerate(tmpList):
  14.         if s == "":
  15.          #¦pªG¬OªÅ¦r¦ê¡A¤£°Ê§@¡Aª½±µ¶i¤J¤U¤@¦¸°j°é
  16.             continue         
  17.         elif any(x in s for x in ['¹q¶l', 'ºô§}', '©v¬£', '¥À·|']):
  18.          #¦pªG¬O'¹q¶l', 'ºô§}', '©v¬£', '¥À·|'¨ä¤¤¤§¤@¡A±N¨ä¥h°£'¡G'«á°µ¬°key­È¡AtmpListªº¤U¤@­Ó¤¸¯À°µ¬°value¡A·s¼W¦ÜmyDict
  19.             myDict[s.replace('¡G', '')] = tmpList[i+1]   
  20.         else:                                                                        
  21.             try:
  22.                 myDict[s.split('¡G')[0]] = s.split('¡G')[1]
  23.           #¨ä¥Lªº¥H'¡G'¤À³Î¡A«e³¡°µ¬°key­È¡A«á³¡°µ¬°value¡A·s¼W¦ÜmyDict  
  24.             except:
  25.           #¦pªG¦³¨Ò¥~¡A¸õ¹L¤£³B²z
  26.                 pass
  27.           #±NmyDict¾ã­Ó©ñ¨ìmyList¸Ì                                               
  28.     myList.append(myDict)
½Æ»s¥N½X
³o¼Ëget_detailÀ³¸Ó´N®t¤£¦h§¹¦¨¤F¡A¤U¤È¦³ªÅ¦A¨Ó´ú¸Õ¤@¤U¡C

TOP

¥[¤Fµù¸ÑÅܱo¦³ÂI¶Ã¡Aªþ¤@­Ó¯Âcodeªº¡C
  1. def get_detail(url, s):
  2.     print(url)
  3.     res = s.get(url)
  4.     res.encoding = 'utf-8'
  5.     soup = BeautifulSoup(res.text, 'lxml')
  6.     detail = soup.find_all('td', 'church_detail')
  7.     for ddd in detail[0].stripped_strings:
  8.         if '¦^³ø¸ê®Æ¿ù»~ >' in ddd:
  9.             continue
  10.         else:
  11.             tmpList.append(ddd)

  12.     for i, s in enumerate(tmpList):
  13.         if s == "":
  14.             continue
  15.         elif any(x in s for x in ['¹q¶l', 'ºô§}', '©v¬£', '¥À·|']):
  16.             myDict[s.replace('¡G', '')] = tmpList[i+1]
  17.         else:
  18.             try:
  19.                 myDict[s.split('¡G')[0]] = s.split('¡G')[1]
  20.             except:
  21.                 pass
  22.     myList.append(myDict)
½Æ»s¥N½X

TOP

¦^´_ 70# c_c_lai
Screenshot_4.png
2016-9-11 17:31

tmpList¨S¦³©w¸q¡C
­n°O±o¦bfunction¥~­±¥[¤W¡G
myDict = {}
myList = []
tmpList = []
µM«á¨º­Óprint(myUrl)¥i¥Hµù¸Ñ±¼¡C

TOP

¦^´_ 73# c_c_lai
¸òget_detail©ñªº¦ì¤l¤]¦³Ãö«Y(call functionªº®É­Ôpython´N·|¥h§ä¡A¦pªGÁÙ¨S©w¸q´N·|¦³°ÝÃD)¡C
§ï¤F¤@­Óbug(¤§«e·|§}¨S³B²z¨ì)¡A¥[¤W¿é¥Xªº³¡¥÷¡C
¥Ø«eªºcode§Ú¾ã²z¤F¤@¤U¡A±z¥Î³o­Ó¦A´ú¸Õ¬Ý¬Ý¡C
  1. import requests
  2. from bs4 import BeautifulSoup
  3. import csv


  4. def get_detail(url, s):
  5.     print(url)
  6.     res = s.get(url)
  7.     res.encoding = 'utf-8'
  8.     soup = BeautifulSoup(res.text, 'lxml')
  9.     detail = soup.find_all('td', 'church_detail')
  10.     for ddd in detail[0].stripped_strings:
  11.         if '¦^³ø¸ê®Æ¿ù»~ >' in ddd:
  12.             continue
  13.         else:
  14.             tmpList.append(ddd)

  15.     for i, s in enumerate(tmpList):
  16.         if s == "":
  17.             continue
  18.         elif any(x in s for x in ['¹q¶l', 'ºô§}', '©v¬£', '¥À·|']):
  19.             myDict[s.replace('¡G', '')] = tmpList[i+1]
  20.         elif '·|§}' in s:
  21.             myDict[s.split('¡G')[0]] = (s.split('¡G')[1] + tmpList[i+1])
  22.         else:
  23.             try:
  24.                 myDict[s.split('¡G')[0]] = s.split('¡G')[1]
  25.             except:
  26.                 pass
  27.     myList.append(myDict)


  28. myDict = {}
  29. myList = []
  30. tmpList = []
  31. s = requests.session()

  32. for i in range(1, 2):
  33.     url = 'http://church.oursweb.net/slocation.php?w=1&c=TW&a=&t=&p=' + str(i)
  34.     res = s.get(url)
  35.     res.encoding = 'utf-8'
  36.     soup = BeautifulSoup(res.text, 'lxml')

  37.     for d in soup.select('a[href^="church.php?pkey"]'):
  38.         myUrl = 'http://church.oursweb.net/' + d.get('href')
  39.         get_detail(myUrl,s)
  40.         myDict = {}

  41.     i += 1

  42. with open('gospel2.csv', 'a', new='', encoding='utf-8') as f:
  43.         fieldnames = ['«ØÀÉ ID', '¤¤¤å¦WºÙ', '­^¤å¦WºÙ', '¤ÀÃþ', '©v¬£', '¥À·|', 'ºô§}', '°ê§O°Ï°ì', '³]¥ß®É¶¡', '­t³d¤H', '¹q¸Ü', '¶Ç¯u', '¹q¶l', '·|§}', '³q°T³B']
  44.         w = csv.DictWriter(f, fieldnames)
  45.         w.writeheader()
  46.         w.writerows(myList)
½Æ»s¥N½X

TOP

¦^´_ 75# c_c_lai
¸ò¦­¤W¤@¼Ëªº°ÝÃD¡A¤£ª¾¹D¬ObugÁÙ¬O¦³¯S®í­ì¦]¡A½×¾Â¤£¯à¿é¤J¡u½u¡vªº­^¤å¡C
Screenshot_1.png
2016-9-11 18:23

³oÃä§ï¤@¤UÀ³¸Ó´N¥i¥H¤F¡C

TOP

¥»©«³Ì«á¥Ñ zyzzyva ©ó 2016-9-11 18:45 ½s¿è

¦^´_ 77# c_c_lai
³o¼ËÀ³¸Ó¬O¦³work¤F¡A¥Îexcel¶}¬Ý¬Ý¡A¦pªG¬O¶Ã½X­n¥Îµ§°O¥»¶}°_¨Ó¦sÀɤ@¤U¦A­«¶}¡C

TOP

        ÀR«ä¦Û¦b : ·O´d¨S¦³¼Ä¤H¡A´¼¼z¤£°_·Ð´o¡C
ªð¦^¦Cªí ¤W¤@¥DÃD