I'm extracting user comments from a range of websites (like reddit.com) and Youtube is also another juicy source of information for me. My existing scraper is written in R:
# x is the url
html = getURL(x)
doc = htmlParse(html, asText=TRUE)
txt = xpathSApply(doc,
//body//text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)]",xmlValue)
This doesn't work on Youtube data, in fact if you look at the source of a Youtube video like this for example, you'd find that comments do not appear in the source.
Does anyone have any suggestions on how to extract data in such circumstances?
Many thanks!