From Python to Database or CSV

Question

Quite new to Python and scraping but have so far got this code together to get the artist and title of the song off the site.

When I run the code I get first a list of the artists followed by a list of the titles.

My question is: how do I get these results into a database or a csv file?

I have notebook++ working for python, plus pycharm and idle and this bit of code works OK with all three. Any suggestions most welcome.

from urllib import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/" )

bsObj = BeautifulSoup(html)
nameList = bsObj. findAll("div" , {"class" : "artist",})
for name in nameList:
print(name. get_text())

html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/" )
bsObj = BeautifulSoup(html)
nameList = bsObj. findAll("div" , {"class" : "title"})
for name in nameList:
print(name. get_text())

Try [csv](https://docs.python.org/2/library/csv.html) and let us know if you have problems. — Peter Wood, Aug 13 '15 at 19:40

wpercy · Answer 1 · 2015-08-13T20:07:02.823

2

This should write to a two-column csv file where the first column is the artist and the second column is the song title.

import csv
from urllib import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/" )

bsObj = BeautifulSoup(html)
artistList = bsObj.findAll("div", {"class": "artist"})
songList = bsObj.findAll("div", {"class": "title"})
artists = [ a.getText().strip() for a in artistList ]
songs = [ s.getText().strip() for s in songList ]

with open('csvfile.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile, delimiter=",")
    for c in zip(artists, songs):
        writer.writerow(c)

edited Aug 13 '15 at 20:07

answered Aug 13 '15 at 19:54

wpercy

8,491
4
29
39

Hi wilbur,many thanks for your code and it works great.But I cant seem to add any more fields to it(in this case the the labelList.Also am I right in saying that if a csv file doesn't exist then python creates one.regards looknow – looknow Aug 14 '15 at 11:06
Yes, python will indeed create the file if it does not exist. In order to add more fields, you should just be able to do something like `labelList = bsObj.findAll("div", {"class": "label"})` and create another list called labels like this `labels = [ l.getText().strip() for l in labelList ]`. Then, add `labels` to the zip function like `for c in zip(artists, songs, labels):` – wpercy Aug 14 '15 at 14:13

score 0 · Answer 2 · answered Aug 13 '15 at 20:15

0

Or you could simply use pandas to_csv function:

import pandas as pd
from pandas import DataFrame as df
from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/" )

bsObj = BeautifulSoup(html)
DB = df(columns = ['artists','songs'])
artistList = bsObj.findAll("div", {"class": "artist"})
songList = bsObj.findAll("div", {"class": "title"})
DB['artists'] = [ a.getText().strip() for a in artistList ]
DB['songs'] = [ s.getText().strip() for s in songList ]

DB.to_csv('csvfile.csv')

answered Aug 13 '15 at 20:15

Vlad Mironov

558
2
14

Hi vlad,thanks for your code,I am having difficulty downloading pandas,can you suggest a simple way to download it onto a windows machine...many thanks looknow – looknow Aug 14 '15 at 11:07
Do you have pip? If you don't check this: http://stackoverflow.com/questions/4750806/how-to-install-pip-on-windows – Vlad Mironov Aug 14 '15 at 11:26
Hi Vlad have installed pandas but when I run the program I get 'import error cannot import name DataFrame,I thne tried to add DataFrame via pip and got'could not find a version that satisfies the requirement(DataFrame) from versions,no matching distribution found...have you any suggestion as to what the problem is.Kind Regards – looknow Aug 15 '15 at 10:53
DataFrame is a module of Pandas so it's obviously something wrong with your installation. – Vlad Mironov Aug 16 '15 at 06:50

From Python to Database or CSV

2 Answers2