[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(DB)

Notice

Recent Posts

Recent Comments

Link

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

ㅡ.ㅡ

[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(DB) 본문

Coding

[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(DB)

ekwkqk12 2018. 5. 4. 04:02

※ DB 파일로 데이터 저장하기

sqlite3모듈을 사용하여 책 목록 페이지에서 책에 이름과 상세정보가 담긴 링크를 추출하여 DB파일에 저장하는 코드이다.

코드에 사용된 중요 함수는 아래 표와 같다.

from urllib.request import urlopen import re, sqlite3 from html import unescape  # 웹 페이지 요청 req = urlopen("http://www.hanbit.co.kr/store/books/full_book_list.html") encoding = req.info().get_content_charset(failobj="utf-8") html = req.read().decode(encoding)  # DB 설정 conn = sqlite3.connect("booklist_db.db") c = conn.cursor() c.execute("DROP TABLE IF EXISTS booklist") c.execute("""CREATE TABLE booklist(title text, link text)""")  # 정규 표현식을 활용하여 데이터 추출 for partial_html in re.findall(r'<td class="left"><a.*?</td>', html, re.DOTALL):    url = re.search(r'<a href="(.*?)">', partial_html).group(1)    url = 'http://www.hanbit.co.kr' + url    title = re.sub(r'<.*?>', '', partial_html)    title = unescape(title)     # DB파일에 데이터 저장    c.execute("INSERT INTO booklist(title, link) VALUES (?,?)",(title, url))    conn.commit() conn.close()

생성된 DB파일을 확인한 결과 아래 그림과 같이 booklist 테이블에 책이름과 링크가 저장된것을 볼 수 있다.

'Coding' 카테고리의 다른 글

[Python/Crawling] reqeusts - 폼 (0)	2018.05.04
[Python/Crawling] requests - 요청과 응답 (0)	2018.05.04
[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(JSON) (0)	2018.05.04
[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(CSV) (0)	2018.04.30
[Python/Crawling] Urillib - 웹 페이지 요청 (0)	2018.04.30

'Coding' Related Articles

ㅡ.ㅡ

[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(DB) 본문

[Python/Crawling] urllib - 특정 데이터 추출 및 파일 저장(DB)

※ DB 파일로 데이터 저장하기

'Coding' 카테고리의 다른 글

티스토리툴바