[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(JSON)

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

ㅡ.ㅡ

[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(JSON) 본문

Coding

[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(JSON)

ekwkqk12 2018. 5. 4. 03:51

※ CSV 파일로 데이터 저장하기

json모듈을 사용하여 책 목록 페이지의 책에 이름과 상세정보가 담긴 링크를 추출하여 Json파일로 저장하는 코드이다.

코드에 사용된 중요 함수는 표와 같다.

import json,re from urllib.request import urlopen from html import unescape  # 웹 페이지 읽어오기 req = urlopen("http://www.hanbit.co.kr/store/books/full_book_list.html") encoding = req.info().get_content_charset(failobj="utf-8") html = req.read().decode(encoding)  # 파일 생성 with open("booklist.json", "w", encoding="utf-8") as f:     data = []     # 데이터 추출     for partial_html in re.findall(r'<td class="left"><a.*?</td>', html, re.DOTALL):         url = re.search(r'<a href="(.*?)">', partial_html).group(1)         url = 'http://www.hanbit.co.kr' + url         title = re.sub(r'<.*?>', '', partial_html)         title = unescape(title)         data.append({"BookName": title, "Link": url})         # 데이터 json 형태로 출력         print(json.dumps(data, ensure_ascii=False, indent=2))      # 데이터 변형 및 추가     json.dump(data, f, ensure_ascii=False, indent=2)

생성된 JSON파일을 확인한 결과 아래 그림과 같이 책이름과 링크가 저장된것을 볼 수 있다.

'Coding' 카테고리의 다른 글

[Python/Crawling] requests - 요청과 응답 (0)	2018.05.04
[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(DB) (0)	2018.05.04
[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(CSV) (0)	2018.04.30
[Python/Crawling] Urillib - 웹 페이지 요청 (0)	2018.04.30
[Python/Crawling] 크롤링 및 스크레이핑 (0)	2018.04.29

'Coding' Related Articles

ㅡ.ㅡ

[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(JSON) 본문

[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(JSON)

※ CSV 파일로 데이터 저장하기

'Coding' 카테고리의 다른 글

티스토리툴바