I've been playing with Nutch recently, partly as part of my attempts to get back into Java development. I've got it creating a crawl database and can do searches by Lucene via the web interface. It's really fast, which is great. I have this idea of providing specialised search for my own sites, which would involve a lot of customisation to the Nutch web interface. I have yet to get it to compile from source though!
What I'm not able to do yet is update an already crawled database. I can only see how you'd delete the existing database and re-crawl the lot, which can't be right.
For the interested here's my little How-To, which is mostly just following someone else's with modifications for Nutch 0.9:
Based on the most useful tutorial found so far:
http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html
Assuming a database of "crawl-tinysite".
Crawl URLs into a new database:
bin/nutch crawl urls -dir crawl-tinysite -depth 3
Show statistics on crawl:
bin/nutch readdb crawl-tinysite/crawldb -stats
(some of the tutorial's commands are no longer valid for Nutch 0.9)
Show database segments created by nutch:
bin/nutch readseg -list -dir crawl-tinysite/segments/