1

I want to get the source code of HTML document which is inside an HTML tag which is generated after some JavaScript and store it in a variable. Here, the HTML tag is <iframe> and it contains a variable kind of something that looks like #document and when I expand this, I get an HTML document which looks something like <!DOCTYPE html> <html>...</html>

To summarize:

<iframe src="https://www.XXXXXX.com/" allow="autoplay; fullscreen" frameborder="no" scrolling="no" allowfullscreen="yes" style="width: 100%; height: 100%;">
    #document
        <!DOCTYPE html>
        <html>...</html>  // a whole new HTML document
</iframe>

I want to store all the content of this HTML document as a string in python

What I have done:

driver.find_element_by_xpath('/path/to/iframe/tag').get_attribute('innerHTML')

but, this just returns an empty string. Also, I have checked if it is working with BeautifulSoup

html = driver.execute_script("return document.body.innerHTML")
soup = BeautifulSoup(html, 'html5lib')
print(soup.prettify())

but, this also isn't working

NOTE: I run these test only after the script is executed, also, I guess the problem seems to be with the #document thing

  • What about fetching the url of the iframe and then again call driver.get with that url. See Scott also suggested this and he also provided a code snippet. – Stephan Schrijver May 28 '19 at 18:05
  • You can't get ```iframe```'s ```innerHTML```, you have to redirect to it's ```src``` – Mohammad Zamanian May 28 '19 at 18:09
  • Is the iframe of interest present in the initial response as this is not always the case - then you won't be able to extract src from response to make next request - though you can manually take the src from webpage and issue against that. – QHarr May 28 '19 at 18:39
  • @QHarr it is present in the initial response, but redirecting to it's src is not that helpful... anyways, I got the answer by switching the driver's frame – Arihant Bedagkar May 28 '19 at 18:45
  • Doh.... should have said that! – QHarr May 28 '19 at 18:47

3 Answers3

1

You can't get iframe content by using innerHTML, as you can't do it even with javascript inside a self made html document, like so:

function Button(){
    var iframe = document.getElementsByTagName("iframe")[0];
    var p = document.getElementsByTagName("p")[0];
    p.innerHTML = "Result of iframe.innerHTML: " + iframe.innerHTML;
}
<iframe src="https://bing.com/"></iframe>
<br>
<button onclick="Button();">Click me to alert innerHTML</button>
<p></p>

Instead, you want to redirect to iframe's src and get html content.

Didn't test the following code but i hope it helps you.

driver = webdriver.Firefox(executable_path=firefox_path, firefox_profile=firefox_profile)
driver.get('https://example.com/')
documentText = driver.page_source

soup =  BeautifulSoup(documentText)
iframe_source = soup.find('iframe')['src']

driver.get(iframe_source)

documentText = driver.page_source
soup = BeautifulSoup(documentText)
html = soup.find('html')

print(html.content)
Mohammad Zamanian
  • 694
  • 1
  • 7
  • 16
  • I tried testing the code you provided, but since the src URL is not permissible to access, the code fails... but I found my answer. Thanks a lot for your help! – Arihant Bedagkar May 28 '19 at 18:39
0

Why would you want a HTML document in an html document? I do think this isn't possible, but you could try putting an HTLM document on a different site and than by using <iframe src="www.html-content.com"></iframe>

Kaochi
  • 27
  • 6
  • There is also a way with using `jQuery`. For this i would suggest you look on this topic: [link](https://stackoverflow.com/questions/8988855/include-another-html-file-in-a-html-file) – Kaochi May 28 '19 at 18:02
  • I don't want to include an html file in an html file, instead, I have the page source of a webpage, means, it is already given to me... Can you elaborate more on that #document variable? I'm seeing it for the first time, due to which I am unable to get the html code which is inside it – Arihant Bedagkar May 28 '19 at 18:07
  • @ArihantBedagkar document is a virtual element, which doesn't really mean anything. – Mohammad Zamanian May 28 '19 at 18:13
  • @ArihantBedagkar more about ```#document``` on: https://stackoverflow.com/questions/21474605/what-does-document-mean – Mohammad Zamanian May 28 '19 at 18:14
  • @Scott the link you provided worked for me like a charm! The only thing to do was to switch the driver to the new frame. Thanks a lot! – Arihant Bedagkar May 28 '19 at 18:38
0

The answer is simple, I just switched from current frame to the frame of <iframe> element

Code:

driver.switch_to.default_content()
frame = driver.find_element_by_xpath('//iframe')
driver.switch_to.frame(frame)