I am trying to scrape a website using puppeteer and cheerio. I have gotten the html of the page I want to scrape using puppeteer. I have loaded that html into cheerio.
async function run() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
const html = await get_to_page_with_required_source_code(page);
const $ = cheerio.load(html);
await browser.close();
}
run();
What I want to do now is remove all elements from the HTML that contains no text. Below is an example.
<div class="abc">
<img src="..." />
</div>
<div class="def">
<div class="jkl">
<span class="ghi">This is a text</span>
</div>
<div class="mno">This is another text</div>
</div>
The output of the above HTML should be:
<span class="ghi">This is a text</span>
<div class="mno">This is another text</div>
since these are the only two elements that contain text in them.
How can I accomplish this?