-1

Quite new to Python and scraping but have so far got this code together to get the artist and title of the song off the site.

When I run the code I get first a list of the artists followed by a list of the titles.

My question is: how do I get these results into a database or a csv file?

I have notebook++ working for python, plus pycharm and idle and this bit of code works OK with all three. Any suggestions most welcome.

from urllib import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/" )

bsObj = BeautifulSoup(html)
nameList = bsObj. findAll("div" , {"class" : "artist",})
for name in nameList:
print(name. get_text())

html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/" )
bsObj = BeautifulSoup(html)
nameList = bsObj. findAll("div" , {"class" : "title"})
for name in nameList:
print(name. get_text())
Michael Currie
  • 11,166
  • 7
  • 39
  • 54
looknow
  • 33
  • 3
  • 8

2 Answers2

2

This should write to a two-column csv file where the first column is the artist and the second column is the song title.

import csv
from urllib import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/" )

bsObj = BeautifulSoup(html)
artistList = bsObj.findAll("div", {"class": "artist"})
songList = bsObj.findAll("div", {"class": "title"})
artists = [ a.getText().strip() for a in artistList ]
songs = [ s.getText().strip() for s in songList ]

with open('csvfile.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile, delimiter=",")
    for c in zip(artists, songs):
        writer.writerow(c)
wpercy
  • 8,491
  • 4
  • 29
  • 39
  • Hi wilbur,many thanks for your code and it works great.But I cant seem to add any more fields to it(in this case the the labelList.Also am I right in saying that if a csv file doesn't exist then python creates one.regards looknow – looknow Aug 14 '15 at 11:06
  • Yes, python will indeed create the file if it does not exist. In order to add more fields, you should just be able to do something like `labelList = bsObj.findAll("div", {"class": "label"})` and create another list called labels like this `labels = [ l.getText().strip() for l in labelList ]`. Then, add `labels` to the zip function like `for c in zip(artists, songs, labels):` – wpercy Aug 14 '15 at 14:13
0

Or you could simply use pandas to_csv function:

import pandas as pd
from pandas import DataFrame as df
from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.officialcharts.com/charts/singles-chart/19800203/7501/" )

bsObj = BeautifulSoup(html)
DB = df(columns = ['artists','songs'])
artistList = bsObj.findAll("div", {"class": "artist"})
songList = bsObj.findAll("div", {"class": "title"})
DB['artists'] = [ a.getText().strip() for a in artistList ]
DB['songs'] = [ s.getText().strip() for s in songList ]

DB.to_csv('csvfile.csv')
Vlad Mironov
  • 558
  • 2
  • 14
  • Hi vlad,thanks for your code,I am having difficulty downloading pandas,can you suggest a simple way to download it onto a windows machine...many thanks looknow – looknow Aug 14 '15 at 11:07
  • Do you have pip? If you don't check this: http://stackoverflow.com/questions/4750806/how-to-install-pip-on-windows – Vlad Mironov Aug 14 '15 at 11:26
  • Hi Vlad have installed pandas but when I run the program I get 'import error cannot import name DataFrame,I thne tried to add DataFrame via pip and got'could not find a version that satisfies the requirement(DataFrame) from versions,no matching distribution found...have you any suggestion as to what the problem is.Kind Regards – looknow Aug 15 '15 at 10:53
  • DataFrame is a module of Pandas so it's obviously something wrong with your installation. – Vlad Mironov Aug 16 '15 at 06:50