1

I noticed that facebook has some weird class names that look computer generated. What I don't know is if these classes are at least constant over time or they change in some time interval? Maybe someone who has experience with that can answer. Only thing I can see is that when I exit Chrome and open it again it is still the same, so at least they don't change every browser session.

So I'd guess the best way to go about scraping facebook would be to use some elements in user interface and assume structure is always the same, like for example to get address from About section something like this:

from selenium import webdriver
driver = webdriver.Chrome("C:/chromedriver.exe")

driver.get("https://www.facebook.com/pg/Burma-Superstar-620442791345784/about/?ref=page_internal")
# wait some time
address_elements = driver.find_elements_by_xpath("//span[text()='FIND US']/../following-sibling::div//button[text()='Get Directions']/../../preceding-sibling::div[1]/div/span")
for item in address_elements:
    print item.text
DebanjanB
  • 118,661
  • 30
  • 168
  • 217
DoctorEvil
  • 377
  • 5
  • 11
  • 3
    The best way to scrape Facebook is not to scrape Facebook because Facebook doesn't allow it. And yes of course the class names can change. – WizKid Mar 04 '19 at 22:14

1 Answers1

2

You were pretty correct. Facebook is built through ReactJS which is pretty much evident from the presence of the following keywords and tags within the HTML DOM:

  • {"react_render":true,"reflow":true}
  • <!-- react-mount-point-unstable -->
  • ["React-prod"]
  • ["ReactDOM-prod"]
  • ReactComposerTaggerType:{r:["t5r69"],be:1}

So, the dynamically generated class names are bound to change after certain timegaps.


Solution

The solution would be to use the static attributes to construct a dynamic Locator Strategy.

To retrieve the first line of the address just below the text FIND US you need to induce WebDriverWait in conjunction with expected_conditions as visibility_of_element_located() and you can use the following optimized solution:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[normalize-space()='FIND US']//following::span[2]"))))

References

You can find some relevant discussions in:


Outro

Note: Scrapping Facebook violates their Terms of Service of section 3.2.3 and you are liable to be questioned and may even land up in Facebook Jail. Use Facebook Graph API instead.

DebanjanB
  • 118,661
  • 30
  • 168
  • 217
  • Thanks for the answer. I never even tried to run a script to scrape facebook, I would imagine they probably have captcha. Is their API free/easy to get (noob questions I know, but can't find any info for price) – DoctorEvil Mar 15 '19 at 15:18
  • @DoctorEvil I did prepare some documentation on _Facebook Graph API_ last year. If I come across those I will share with you. API won't be that tough for you as you have mastered the _Selenium-Python_ client implementation. – DebanjanB Mar 15 '19 at 15:23