So I have the following script:
#!/usr/bin/env python3
import requests
from bs4 import BeautifulSoup
def parse_marketwatch_calendar(url):
#page=requests.get(url).text
#soup=BeautifulSoup(page, "lxml")
soup = BeautifulSoup(requests.get(url).text, 'html5lib')
print(soup)
title_parent = soup.find("div", class_="column column --primary")
print(title_parent)
url = "https://www.marketwatch.com/economy-politics/calendar?mod=economy-politics"
parse_marketwatch_calendar(url)
However, when I run it, I get the following repeated response:
<html><head><title>marketwatch.com</title><style>#cmsg{animation: A
1.5s;}@keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}</style></head><body style="margin:0"><p id="cmsg">Please enable JS and disable any ad blocker</p><script data-cfasync="false">var dd={'rt':'c','cid':'AHrlqAAAAAMA0byAgFOkDSoAGKNiPA==','hsh':'D428D51E28968797BC27FB9153435D','t':'bv','s':47891,'e':'a873fd745afc33d92e8f68dbadb4eaa3536c2c677a9785033316a437fb980f49','host':'geo.captcha-delivery.com'}</script><script data-cfasync="false" src="https://ct.captcha-delivery.com/c.js"></script></body></html> None
A Javascript checker is intercepting my page and I’m not able to access the MarketWatch Economic Calendar itself. I suppose I can find a different source but without switching to Selenium, is there any way to get to the actual page with requests
or urllib
? I was thinking of adding a user agent header that’s JavaScript enabled but IDK if this would even work or if I’m better off going to Selenium to make the initial website call. Thanks! 🙂