TantivyEx

A comprehensive Elixir wrapper for the Tantivy full-text search engine.

TantivyEx provides a complete, type-safe interface to Tantivy - Rust's fastest full-text search library. Build powerful search applications with support for all Tantivy field types, custom tokenizers, schema introspection, and advanced indexing features.

Features

High Performance: Built on Tantivy, one of the fastest search engines available
Complete Field Type Support: Text, numeric, boolean, date, facet, bytes, JSON, and IP address fields
Advanced Text Processing: Comprehensive tokenizer system with 17+ language support, custom regex patterns, n-grams, and configurable text analyzers
Intelligent Text Analysis: Stemming, stop word filtering, case normalization, and language-specific processing
Dynamic Tokenizer Management: Runtime tokenizer registration, enumeration, and configuration
Schema Management: Dynamic schema building with validation and introspection
Flexible Storage: In-memory or persistent disk-based indexes
Type Safety: Full Elixir typespecs and compile-time safety
Advanced Aggregations: Elasticsearch-compatible bucket aggregations (terms, histogram, date_histogram, range) and metric aggregations (avg, min, max, sum, stats, percentiles) with nested sub-aggregations
Distributed Search: Multi-node search coordination with load balancing, failover, and configurable result merging
Search Features: Full-text search, faceted search, range queries, and comprehensive analytics

Quick Start

# Register tokenizers (recommended for production)
TantivyEx.Tokenizer.register_default_tokenizers()

# Create a schema with custom tokenizers
schema = TantivyEx.Schema.new()
|> TantivyEx.Schema.add_text_field_with_tokenizer("title", :text_stored, "default")
|> TantivyEx.Schema.add_text_field_with_tokenizer("body", :text, "default")
|> TantivyEx.Schema.add_u64_field("id", :indexed_stored)
|> TantivyEx.Schema.add_date_field("published_at", :fast)

# Create an index
# Option 1: Open existing or create new (recommended for production)
{:ok, index} = TantivyEx.Index.open_or_create("/path/to/index", schema)

# Option 2: Create in-memory index for testing
{:ok, index} = TantivyEx.Index.create_in_ram(schema)

# Option 3: Create new persistent index (fails if exists)
{:ok, index} = TantivyEx.Index.create_in_dir("/path/to/index", schema)

# Option 4: Open existing index (fails if doesn't exist)
{:ok, index} = TantivyEx.Index.open("/path/to/index")

# Get a writer
{:ok, writer} = TantivyEx.IndexWriter.new(index, 50_000_000)

# Add documents
doc = %{
  "title" => "Getting Started with TantivyEx",
  "body" => "This is a comprehensive guide to using TantivyEx...",
  "id" => 1,
  "published_at" => "2024-01-15T10:30:00Z"
}

:ok = TantivyEx.IndexWriter.add_document(writer, doc)
:ok = TantivyEx.IndexWriter.commit(writer)

# Search
{:ok, searcher} = TantivyEx.Searcher.new(index)
{:ok, results} = TantivyEx.Searcher.search(searcher, "comprehensive guide", 10)

# Advanced Aggregations (New in v0.2.0)
{:ok, query} = TantivyEx.Query.all()

# Terms aggregation for category analysis
aggregations = %{
  "categories" => %{
    "terms" => %{
      "field" => "category",
      "size" => 10
    }
  }
}

{:ok, agg_results} = TantivyEx.Aggregation.run(searcher, query, aggregations)

# Histogram aggregation for numerical analysis
price_histogram = %{
  "price_ranges" => %{
    "histogram" => %{
      "field" => "price",
      "interval" => 50.0
    }
  }
}

{:ok, histogram_results} = TantivyEx.Aggregation.run(searcher, query, price_histogram)

# Combined search with aggregations
{:ok, search_results, agg_results} = TantivyEx.Aggregation.search_with_aggregations(
  searcher,
  query,
  aggregations,
  20  # search limit
)

# Distributed Search (New in v0.3.0)
{:ok, _supervisor_pid} = TantivyEx.Distributed.OTP.start_link()

# Add multiple search nodes for horizontal scaling
:ok = TantivyEx.Distributed.OTP.add_node("node1", "http://localhost:9200", 1.0)
:ok = TantivyEx.Distributed.OTP.add_node("node2", "http://localhost:9201", 1.5)

# Configure distributed behavior
:ok = TantivyEx.Distributed.OTP.configure(%{
  timeout_ms: 5000,
  max_retries: 3,
  merge_strategy: :score_desc
})

# Perform distributed search
{:ok, results} = TantivyEx.Distributed.OTP.search("search query", 10, 0)

IO.puts("Found #{length(results)} hits across distributed nodes")

Installation

Add tantivy_ex to your list of dependencies in mix.exs:

def deps do
  [
    {:tantivy_ex, "~> 0.1.0"}
  ]
end

TantivyEx requires:

Elixir: 1.18 or later
Rust: 1.70 or later (for compilation)
System: Compatible with Linux, macOS, and Windows

Documentation

Complete Guides

Browse All Guides →

Getting Started

Installation & Setup: Complete installation guide with configuration options
Quick Start Tutorial: Hands-on tutorial for your first search application
Core Concepts: Essential concepts with comprehensive glossary

Development & Production

Performance Tuning: Optimization strategies for indexing and search
Production Deployment: Scalability, monitoring, and operational best practices
Integration Patterns: Phoenix/LiveView integration and advanced architectures

API Documentation

Core Components

Schema Management: Field types, options, and schema design patterns
Document Operations: Adding, updating, and managing documents
Indexing: Index creation, writing, and maintenance
Search Operations: Query syntax, ranking, and search best practices
Search Results: Result handling and formatting
Aggregations: Data analysis, bucket and metric aggregations, and analytics
Distributed Search: Multi-node coordination, load balancing, and horizontal scaling
Tokenizers: Text analysis, custom tokenizers, and language support

API Reference

TantivyEx: Main module with index operations
TantivyEx.Schema: Schema definition and management
TantivyEx.Aggregation: Data analysis and aggregation operations
TantivyEx.Native: Low-level NIF functions

Field Types

TantivyEx supports all Tantivy field types with comprehensive options:

Field Type	Elixir Function	Use Cases	Options
Text	`add_text_field/3`	Full-text search, titles, content	`:text`, `:text_stored`, `:stored`, `:fast`, `:fast_stored`
U64	`add_u64_field/3`	IDs, counts, timestamps	`:indexed`, `:indexed_stored`, `:fast`, `:stored`, `:fast_stored`
I64	`add_i64_field/3`	Signed integers, deltas	`:indexed`, `:indexed_stored`, `:fast`, `:stored`, `:fast_stored`
F64	`add_f64_field/3`	Prices, scores, coordinates	`:indexed`, `:indexed_stored`, `:fast`, `:stored`, `:fast_stored`
Bool	`add_bool_field/3`	Flags, binary states	`:indexed`, `:indexed_stored`, `:fast`, `:stored`, `:fast_stored`
Date	`add_date_field/3`	Timestamps, publication dates	`:indexed`, `:indexed_stored`, `:fast`, `:stored`, `:fast_stored`
Facet	`add_facet_field/2`	Categories, hierarchical data	Always indexed and stored
Bytes	`add_bytes_field/3`	Binary data, hashes, images	`:indexed`, `:indexed_stored`, `:fast`, `:stored`, `:fast_stored`
JSON	`add_json_field/3`	Structured objects, metadata	`:text`, `:text_stored`, `:stored`
IP Address	`add_ip_addr_field/3`	IPv4/IPv6 addresses	`:indexed`, `:indexed_stored`, `:fast`, `:stored`, `:fast_stored`

Custom Tokenizers

New in v0.2.0: TantivyEx now provides comprehensive tokenizer support with advanced text analysis capabilities:

# Register default tokenizers (recommended starting point)
TantivyEx.Tokenizer.register_default_tokenizers()

# Create custom text analyzers with advanced processing
TantivyEx.Tokenizer.register_text_analyzer(
  "english_full",
  "simple",           # base tokenizer
  true,               # lowercase
  "english",          # stop words language
  "english",          # stemming language
  50                  # remove tokens longer than 50 chars
)

# Register specialized tokenizers for different use cases
TantivyEx.Tokenizer.register_regex_tokenizer("email", "\\b[\\w._%+-]+@[\\w.-]+\\.[A-Z|a-z]{2,}\\b")
TantivyEx.Tokenizer.register_ngram_tokenizer("fuzzy_search", 2, 4, false)

# Add fields with specific tokenizers
schema = TantivyEx.Schema.new()
|> TantivyEx.Schema.add_text_field_with_tokenizer("content", :text, "english_full")
|> TantivyEx.Schema.add_text_field_with_tokenizer("product_code", :text_stored, "simple")
|> TantivyEx.Schema.add_text_field_with_tokenizer("tags", :text, "whitespace")

Available Tokenizer Types

Simple Tokenizer: Basic punctuation and whitespace splitting with lowercase normalization
Whitespace Tokenizer: Splits only on whitespace, preserves case and punctuation
Regex Tokenizer: Custom pattern-based tokenization for specialized formats
N-gram Tokenizer: Character or word n-grams for fuzzy search and autocomplete
Text Analyzer: Advanced processing with configurable filters (stemming, stop words, case normalization)

Multi-Language Support

Built-in support for 17+ languages with language-specific stemming and stop words:

# Language-specific analyzers
TantivyEx.Tokenizer.register_language_analyzer("english")  # -> "english_text"
TantivyEx.Tokenizer.register_language_analyzer("french")   # -> "french_text"
TantivyEx.Tokenizer.register_language_analyzer("german")   # -> "german_text"

# Or just stemming
TantivyEx.Tokenizer.register_stemming_tokenizer("spanish") # -> "spanish_stem"

Supported languages: English, French, German, Spanish, Italian, Portuguese, Russian, Arabic, Danish, Dutch, Finnish, Greek, Hungarian, Norwegian, Romanian, Swedish, Tamil, Turkish

Performance Tips

Use appropriate field options: Only index fields you need to search
Leverage fast fields: Use :fast for sorting and aggregations
Batch operations: Commit documents in batches for better performance
Memory management: Tune writer memory budget based on your use case
Choose optimal tokenizers: Match tokenizers to your search requirements
- Use "default" for natural language content with stemming
- Use "simple" for structured data like product codes
- Use "whitespace" for tag fields and technical terms
- Use "keyword" for exact matching fields
Language-specific optimization: Use language analyzers for better search quality
Register tokenizers once: Call register_default_tokenizers() at application startup
Distributed search optimization:
- Weight nodes based on their actual capacity
- Use "score_desc" merge strategy for best relevance
- Monitor cluster health and adjust timeouts accordingly
- Consider "health_based" load balancing for production environments

Examples

E-commerce Search

# Product search schema
schema = TantivyEx.Schema.new()
|> TantivyEx.Schema.add_text_field("name", :text_stored)
|> TantivyEx.Schema.add_text_field("description", :text)
|> TantivyEx.Schema.add_f64_field("price", :fast_stored)
|> TantivyEx.Schema.add_facet_field("category")
|> TantivyEx.Schema.add_bool_field("in_stock", :fast)
|> TantivyEx.Schema.add_u64_field("product_id", :indexed_stored)

Blog Search

# Setup tokenizers for blog content
TantivyEx.Tokenizer.register_default_tokenizers()
TantivyEx.Tokenizer.register_language_analyzer("english")

# Blog post schema with optimized tokenizers
schema = TantivyEx.Schema.new()
|> TantivyEx.Schema.add_text_field_with_tokenizer("title", :text_stored, "english_text")
|> TantivyEx.Schema.add_text_field_with_tokenizer("content", :text, "english_text")
|> TantivyEx.Schema.add_text_field_with_tokenizer("tags", :text_stored, "whitespace")
|> TantivyEx.Schema.add_date_field("published_at", :fast_stored)
|> TantivyEx.Schema.add_u64_field("author_id", :fast)

Log Analysis

# Application log schema
schema = TantivyEx.Schema.new()
|> TantivyEx.Schema.add_text_field("message", :text_stored)
|> TantivyEx.Schema.add_text_field("level", :text)
|> TantivyEx.Schema.add_ip_addr_field("client_ip", :indexed_stored)
|> TantivyEx.Schema.add_date_field("timestamp", :fast_stored)
|> TantivyEx.Schema.add_json_field("metadata", :stored)

Distributed Search Setup

# Multi-node search coordinator for horizontal scaling
{:ok, _supervisor_pid} = TantivyEx.Distributed.OTP.start_link()

# Add nodes with different weights based on capacity
:ok = TantivyEx.Distributed.OTP.add_node("primary", "http://search1:9200", 2.0)
:ok = TantivyEx.Distributed.OTP.add_node("secondary", "http://search2:9200", 1.5)
:ok = TantivyEx.Distributed.OTP.add_node("backup", "http://search3:9200", 1.0)

# Configure for production use
:ok = TantivyEx.Distributed.OTP.configure(%{
  timeout_ms: 10_000,
  max_retries: 3,
  merge_strategy: :score_desc
})

# Perform distributed search
{:ok, results} = TantivyEx.Distributed.OTP.search("search query", 50, 0)

# Monitor cluster health
{:ok, cluster_stats} = TantivyEx.Distributed.OTP.get_cluster_stats()
IO.puts("Active nodes: #{cluster_stats.active_nodes}/#{cluster_stats.total_nodes}")

Development Setup

git clone https://github.com/alexiob/tantivy_ex.git
cd tantivy_ex
mix deps.get
mix test

License

TantivyEx is licensed under the Apache License 2.0. See LICENSE for details.

Acknowledgments

Tantivy: The powerful Rust search engine that powers TantivyEx
Rustler: For excellent Rust-Elixir integration

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
.vscode		.vscode
assets		assets
config		config
docs		docs
lib		lib
native/tantivy_ex		native/tantivy_ex
test		test
.formatter.exs		.formatter.exs
.gitignore		.gitignore
.tool-versions		.tool-versions
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TantivyEx

Features

Quick Start

Installation

Documentation

Complete Guides

Getting Started

Development & Production

API Documentation

Core Components

API Reference

Field Types

Custom Tokenizers

Available Tokenizer Types

Multi-Language Support

Performance Tips

Examples

E-commerce Search

Blog Search

Log Analysis

Distributed Search Setup

Development Setup

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

alexiob/tantivy_ex

Folders and files

Latest commit

History

Repository files navigation

TantivyEx

Features

Quick Start

Installation

Documentation

Complete Guides

Getting Started

Development & Production

API Documentation

Core Components

API Reference

Field Types

Custom Tokenizers

Available Tokenizer Types

Multi-Language Support

Performance Tips

Examples

E-commerce Search

Blog Search

Log Analysis

Distributed Search Setup

Development Setup

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages