Nutch Search

I am looking for a vertical search engine with simple
three column page contains:
1- a web script that scrap’s the news from 15 (Arabic) newspapers
automaticly every morning, and store it in the server SQL.
Using open source Lucene/Nutch 1.2 or any script you are experienced with.

2- Powerful control panel.

3- a search box, to search the data stored in the SQL.

the search results will show the latest news first:
A- the Title with link to the original source (open in a new window)
B- the scrap date.
C- description (about 200 characters).
D- highlighted keyword.

if it is possible the spider should exclude parts of the pages:
(div id) (div class) (span id) (span class) …etc.

If you did before similar project, it would be
more appreciated to let me see.

Leave a Reply

Your email address will not be published. Required fields are marked *