Code für das Seminar “Information Retrieval” (siehe Seminarplan )
| Inhalt | Ressourcen/Dependencies | Literatur | |
| basic | Korpus, Lineare Suche | Shakespeare | IIR Kap. 1 |
| boole | Term-Dokument-Matrix, Invertierter Index, Listen-Intersection, Positional Index, PositionalIntersect | IIR Kap. 1 + 2 | |
| preprocess | Vorverarbeitung: Tokenisierung, Stemming | snowball stemmer | IIR Kap. 1 + 2 |
| tolerant | Tolerant Retrieval: Levenshtein, Soundex | Apache Commons Lang, Apache Commons Codec | IIR Kap. 3 |
| ranked | Ranked Retrieval: Termgewichtung, Vector Space Model | IIR Kap. 6 + 7 | |
| evaluation | Evaluation: Precision, Recall, F-Maß | IIR Kap. 8 | |
| web | Crawler, WebDocument | Apache Xerxes, Nekohtml | IIR Kap. 19 + 20 |
| lucene | Lucene: Indexer und Searcher | lucene-core, lucene-queryparser, lucene-analyzers-common | Lucene in Action |