- ©«¤l
- 109
- ¥DÃD
- 1
- ºëµØ
- 0
- ¿n¤À
- 116
- ÂI¦W
- 0
- §@·~¨t²Î
- win7
- ³nÅ骩¥»
- 2007
- ¾\ŪÅv
- 20
- µù¥U®É¶¡
- 2016-8-4
- ³Ì«áµn¿ý
- 2018-10-22
|
¥»©«³Ì«á¥Ñ zyzzyva ©ó 2016-9-11 10:41 ½s¿è
¦]¬°³Ì«á§Æ±æ¯à¿é¥X¨ìcsv(·Ç³Æ¨Ï¥Îcsv moduleªºdictwriter¡A¥Hkey:valueªº¤è¦¡¼g¤J)¡A©Ò¥Hn¥ý§â¸ê®Æ³B²z¦n¡C
¸ÕµÛª½±µ§â¸ê®Æ¦L¥X¨Ó
¦n¹³ÁÙ¥i¥H¡C¨º¸ÕµÛ§â¸ê®Æ©ñ¨ìlist¸Ì¬Ý¬Ý
ÁÙ¯u¬OÁà¡AÀ³¸Ó¬O¦]¬°¥þ³¡ªº¸ê®Æ³£¦b¦P¤@Ótd¸Ì¡A¤¤¶¡¤S¦³ªÅ¥Õ¡B´«¦æ¡A¹ê¦b¤£ª¾¹D±q¦ó³B²z°_¡C
´«Ó¤è¦¡¥ý§â¦r¦ê¨ú¥X¨Ó¡A¶¶«K§âªÅ¥Õ¥h±¼¡A¬Ý°_¨Ó¦n¦h¤F¡A¬O§Ú¯à°÷³B²zªº¸ê®Æ¤F¡C
(³o¼Ën¦h¤@Ólist¡A·Pı¦³ÂI¶¸ô¡AÀ³¸Ó¦³§ó¦nªº³B²z¤è¦¡¡A¤£¹L¼È®É¨S·Q¥X¨Ó¡A¥ý¯àwork¦A»¡)
¦]¬°³Ì«á§Æ±æ±o¨ìªº¬O[ {dic1},{dic2},{'«ØÀÉ ID'¡G'811223', '¤ÀÃþ'¡G'±Ð¨|°V½m' ,...}, {dic4}....]ªº®æ¦¡¡A©Ò¥HÁÙ¦³¤@¨Ç°ÝÃD¡G
1¡B'¦^³ø¸ê®Æ¿ù»~ ' ¤£¬O§ÚÌnªº¡A
2¡B'¹q¶l'¡B'ºô§}'µ¥¦³¤¤Â_¡A¹ê»Ú¸ê®Æ¶]¨ìlistªº¤U¤@Ó¤¸¯À¥h¤F¡C(¨ä¹êÁÙ¦³©v¬£¡B¥À·|¡A¨Cӱз|ªº¸ê®Æ¦³¨Ç³\¤£¦P)
¥ý·Ç³Æ´XÓ¸ê®Æ¡G¤@ӪŪº¦r¨å(dictionary)¡B2ӪŪº¦ê¦C(list)¡A³oÓn¦bfunction¥~±¡C
myDict = {}
myList = []
tmpList = []- def get_detail(url, s):
- print(url)
- res = s.get(url)
- res.encoding = 'utf-8'
- soup = BeautifulSoup(res.text, 'lxml')
- detail = soup.find_all('td', 'church_detail')
- for ddd in detail[0].stripped_strings:
- if '¦^³ø¸ê®Æ¿ù»~ >' in ddd:
- #¦pªG¦r¦ê¬O '¦^³ø¸ê®Æ¿ù»~ >'¡A¤£°Ê§@¡Aª½±µ¶i¤J¤U¤@¦¸°j°é
- continue
- else:
- tmpList.append(ddd) #¨ä¥Lªº³£©ñ¶itmpList¸Ì
- for i, s in enumerate(tmpList):
- if s == "":
- #¦pªG¬OªÅ¦r¦ê¡A¤£°Ê§@¡Aª½±µ¶i¤J¤U¤@¦¸°j°é
- continue
- elif any(x in s for x in ['¹q¶l', 'ºô§}', '©v¬£', '¥À·|']):
- #¦pªG¬O'¹q¶l', 'ºô§}', '©v¬£', '¥À·|'¨ä¤¤¤§¤@¡A±N¨ä¥h°£'¡G'«á°µ¬°keyÈ¡AtmpListªº¤U¤@Ó¤¸¯À°µ¬°value¡A·s¼W¦ÜmyDict
- myDict[s.replace('¡G', '')] = tmpList[i+1]
- else:
- try:
- myDict[s.split('¡G')[0]] = s.split('¡G')[1]
- #¨ä¥Lªº¥H'¡G'¤À³Î¡A«e³¡°µ¬°keyÈ¡A«á³¡°µ¬°value¡A·s¼W¦ÜmyDict
- except:
- #¦pªG¦³¨Ò¥~¡A¸õ¹L¤£³B²z
- pass
- #±NmyDict¾ãÓ©ñ¨ìmyList¸Ì
- myList.append(myDict)
½Æ»s¥N½X ³o¼Ëget_detailÀ³¸Ó´N®t¤£¦h§¹¦¨¤F¡A¤U¤È¦³ªÅ¦A¨Ó´ú¸Õ¤@¤U¡C |
|