Crawling any webpage with HtmlUnit not working with Javascript pages

I am developing a web crawler able to recieve any given domain and crawl the main page, then creating a list with all the sub links of that page and crawl them too.

I programmed the part of retrieving the main content of any given page, for that I used HtmlUnit to get the html, and boilerpipe to identify the main content with pretty accurate results.

Now I’m facing the problem of having to identify all the sub-links of that page, the biggest deal here is the fact that every webpage has it’s own html structure. I’ve tried to accomplish that with the following methods:

  • Searching all the Anchors (a): this was my first idea and worked
    pretty well, the problem came when I tried to crawl pages that
    implement JavaScript, as they wont use Anchor tags but (div) tags
    with onClick properties:
    (onclick=”widgetEvCall(‘handlers.openResult’, event, this,
    ‘/Attraction_Review-g187497-d670716-Reviews-Barcelona_Bus_Turistic-Barcelona_Catalonia.html’)
  • Search all the tags with onCLick attributes: this solution didnt work
    either as not all the webs use that attribute.
  • Get the button.click() response: The problem here was that not all
    the elements that have link redirect on them are buttons, some of
    them are just divs.

I know that JSoup can do that pretty easily but it crashes when finding JavaScript elements. At this point I ran out of ideas, anyone could help me with this task?