-
Notifications
You must be signed in to change notification settings - Fork 209
Tutorial: Nutch
The following information was contributed by Praful Bagai.
This tutorial assumes that you are customizing the Reuters tutorial.
In reuters.js
, Update the Solr parameters in var params
to reflect the structure of your Solr documents:
- Update
facet.field
with the fields on which you want to facet - Remove
f.topics.facet.limit
andf.countryCodes.facet.limit
unless your Solr documents havetopics
orcountryCodes
fields - Remove all
facet.date
parameters unless your Solr documents have a date field on which you want to facet
Either update or remove the tag cloud, autocomplete, country code and calendar widgets. For the tag cloud, you can set the associated Solr fields by changing the value of var fields
.
Nutch uses a content
field, instead of a text
field like in the Reuters demo. In reuters.theme.js
, in the AjaxSolr.theme.prototype.snippet
function, replace doc.text
with doc.content
. Nutch has no dateline
field, so remove doc.dateline + ' ' +
.
Check the following properties in your nutch-default.xml
:
<property>
<name>fetcher.store.content</name>
<value>true</value>
<description>If true, fetcher will store content.</description>
</property>
<property>
<name>parser.caching.forbidden.policy</name>
<value>content</value>
<description>If a site (or a page) requests through its robot metatags
that it should not be shown as cached content, apply this policy.
Currently
three keywords are recognized: "none" ignores any "noarchive" directives.
"content" doesn't show the content, but shows summaries (snippets).
"all" doesn't show either content or summaries.</description>
</property>
You may also need to copy fields from your Nutch schema to your Solr schema.