I am trying to use goutte to scrape google serp and i precisely want to get the url from the search results but i am finding it difficult to do so with goutte primarily because of the code layout. Check an example which i took from google serp source code below:
<div class="TbwUpd NJjxre">
<cite class="iUh30 qLRx3b tjvcx" role="text">https://blog.apilayer.com
<span class="dyjrff qzEoUe" role="text"> › search</span></cite></div>
I want to get the url https://blog.apilayer.com but i don’t know how to put the goutte code to do this. This is what i did for the title of the search results and it works fine:
$url = 'https://www.google.com/search?' . http_build_query(array('q' => $query));
// Request search results
$client = new Client();
$crawler = $client->request('GET', $url);
return $crawler->filter('h3')->each(function ($node){
return $info[''] = $node->text();
});
The above is easy since the title is wrapped in an like below:
<h3 class="LC20lb MBeuO DKV0Md">A Complete Guide to Google Search Result Scraping</h3>
but then to get the url from the search result is difficult since the url is not directly wrapped on the target html tag.