Suppose I have a dataframe such as follows,
data
id URL
1 www.pandora.com
2 m.jcpenney.com
3 www.youtube.com
4 www.facebook.com
I want to grep and find particular words in the URL and create a new column in it. Suppose I want to find youtube and facebook alone here, my ideal output would be,
id URL host
1 www.pandora.com None
2 m.jcpenney.com None
3 www.youtube.com youtube
4 www.facebook.com facebook
The URLs are very complex in real data set and also the number of rows are extremely high(~4M). So I want to find 3-4 particular hosts alone and identify them by a new column.
Following is my try,
for i in data['URL']:
re.search('youtube', i)
but I am getting,
TypeError: expected string or buffer
error here. I want to create a new column in the same dataframe where I would give condition for 3-4 hosts and remaining would be none in that column. Can anybody help me?
Thanks