I'm trying to send an http request to a website (for ex, Digikey) and read back the full html. For example, I'm using this link: https://www.digikey.com/products/en?keywords=part_number to get a part number such as: https://www.digikey.com/products/en?keywords=511-8002-KIT. However what I get back is not the full html.
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.digikey.com/products/en?keywords=511-8002-KIT')
soup = BeautifulSoup(r.text)
print(soup.prettify())
Output:
<!DOCTYPE html>
<html>
<head>
<script>
var i10cdone =(function(){ function pingBeacon(msg){ var i10cimg = document.createElement('script'); i10cimg.src='/i10c@p1/botox/file/nv-loaded.js?status='+window.encodeURIComponent(msg); i10cimg.onload = function(){ (document.head || document.documentElement).removeChild(i10cimg) }; i10cimg.onerror = function(){ (document.head || document.documentElement).removeChild(i10cimg) }; ( document.head || document.documentElement).appendChild(i10cimg) }; pingBeacon('loaded'); if(String(document.cookie).indexOf('i10c.bdddb=c2-f0103ZLNqAeI3BH6yYOfG7TZlRtCrMwqUo')>=0) { document.cookie = 'i10c.bdddb=;path=/';}; var error=''; function errorHandler(e) { if (e && e.error && e.error.stack ) { error=e.error.stack; } else if( e && e.message ) { error = e.message; } else { error = 'unknown';}} if(window.addEventListener) { window.addEventListener('error',errorHandler, false); } else { if ( window.attachEvent ){ window.attachEvent('onerror',errorHandler); }} return function(){ if (window.removeEventListener) {window.removeEventListener('error',errorHandler); } else { if (window.detachEvent) { window.detachEvent('onerror',errorHandler); }} if(error) { pingBeacon('error-' + String(error).substring(0,500)); document.cookie='i10c.bdddb=c2-f0103ZLNqAeI3BH6yYOfG7TZlRtCrMwqUo;path=/'; }}; })();
</script>
<script src="/i10c@p1/client/latest/auto/instart.js?i10c.nv.bucket=pci&i10c.nv.host=www.digikey.com&i10c.opts=botox&bcb=1" type="text/javascript">
</script>
<script type="text/javascript">
INSTART.Init({"apiDomain":"assets.insnw.net","correlation_id":"1553546232:4907a9bdc85fe4e8","custName":"digikey","devJsExtraFlags":"{\"disableQuerySelectorInterception\" :true, 'rumDataConfigKey':'/instartlogic/clientdatacollector/getconfig/monitorprod.json','custName':'digikey','propName':'northamerica'}","disableInjectionXhr":true,"disableInjectionXhrQueryParam":"instart_disable_injection","iframeCommunicationTimeout":3000,"nanovisorGlobalNameSpace":"I10C","partialImage":false,"propName":"northamerica","rId":"0","release":"latest","rum":false,"serveNanovisorSameDomain":true,"third_party":["IA://www.digikey.com/js/geotargeting.js"],"useIframeRpc":false,"useWrapper":false,"ver":"auto","virtualDomains":4,"virtualizeDomains":["^auth\\.digikey\\.com$","^authtest\\.digikey\\.com$","^blocked\\.digikey\\.com$","^dynatrace\\.digikey\\.com$","^search\\.digikey\\.com$","^www\\.digikey\\.ca$","^www\\.digikey\\.com$","^www\\.digikey\\.com\\.mx$"]}
);
</script>
<script>
typeof i10cdone === 'function' && i10cdone();
</script>
</head>
<body>
<script>
setTimeout(function(){document.cookie="i10c.eac23=1";window.location.reload(true);},30);
</script>
</body>
</html>
The reason I need the full html is to search into it for specific keywords, such as do the terms "Lead free" or "Through hole" appear in the particular part number result. I'm not only doing this for Digikey, but also other sites.
Any help would be appreciated!
Thanks!
EDIT:
Thank you all for your suggestions/answers. More info here for others who're interested in this: Web-scraping JavaScript page with Python