1

Due to global IT settings, I am having a hard time to use htmlParse or read_HTML. The solution for my purpose, was just to use readLines from the base package and then parse it with htmlParse. Is there a disadvantage to this process that I am not aware of?

At least for my MWE it seems to yield the same output. Maybe this will be different for more elaborate HTML code.

library(XML)

mailing_url = "http://www.r-project.org/mail.html"

mailing lines <- readLines(mailing_url)

mailing_doc.RL = htmlParse(mailing_lines)
mailing_doc.HTML = htmlParse(mailing_url)

all.equal(mailing_doc.RL, mailing_doc.HTML)
Nisse Engström
  • 4,555
  • 22
  • 24
  • 38
Max M
  • 643
  • 8
  • 21
  • What exactly does your IT settings prevent that this works but just using `htmlParse` directly doesn't work? I wouldn't think they would be any different. – MrFlick Jul 02 '18 at 20:14
  • I am at my home computer but I think it was something like `could not resolve host name`. I am trying to contact my IT as well, but there are rather touchy on these topics and I do not want to wake any sleeping dogs. Since the code works at my home computer and does not at my office computer, I am kind of assuming it is because of the IT settings – Max M Jul 02 '18 at 20:18
  • It's hard to believe that would `readLines` would work but `htmlParse` would fail. They would both have to resolve host names. You're sure they are different? – MrFlick Jul 02 '18 at 20:20
  • I would not know. That is why I am asking :) – Max M Jul 02 '18 at 20:47
  • For instance using `read_HTML` yields `Error in open.connection(x, "rb") : Could not resolve host: www.r-project.org` – Max M Jul 03 '18 at 10:51
  • I found a workaround for my It Problem https://stackoverflow.com/questions/36043172/package-rvest-for-web-scraping-https-site-with-proxy/38463559#38463559 and https://stackoverflow.com/questions/33295686/rvest-error-in-open-connectionx-rb-timeout-was-reached This is not an answer to my original question but it avoids my question – Max M Jul 04 '18 at 07:19

0 Answers0