I have been trying to download a web page that I ultimately intend to scrape. The page uses Javascript, and has in their code a catch to test if javascript is enabled, and I keep getting it is not enabled.
I am trying to do it under wsl2 (ubuntu) on a windows 10 machine. I have tried with selenium, headless chrome, and axios, and am unable to figure out how to get it to execute the javascript.
As I want to put this into my crontab, I am not using any gui.
The website is
https://app.aquahawkami.tech/nfc?imei=359986122021410
Before I start to scrape the output, I figure I have to first get a good download, and that is where I am stuck.
Here is the javascript:
// index.js
const axios = require('axios');
const fs = require('fs');
axios.get('https://app.aquahawkami.tech/nfc?imei=359986122021410', {responseType: 'document'}).then(response => {
fs.writeFile('./wm.html', response.data, (err) => {
if (err) throw err;
console.log('The file has been saved!');
});
});
Selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get("https://app.aquahawkami.tech/nfc?imei=359986122021410")
page_source = driver.page_source
print(page_source)
fileToWrite = open("aquahawk_source.html", "w")
fileToWrite.write(page_source)
fileToWrite.close()
driver.close()
finally headless chrome:
`google-chrome –headless –disable-gpu –dump-dom https://app.aquahawkami.tech/nfc?imei=359986122021410
`