PHPhinder Bundle

What is it?

PHPhinder is an open-source, lightweight, and modular search engine designed for PHP applications. It provides powerful search capabilities with a focus on simplicity, speed, and extensibility.

The PHPhinder bundle connects PHPhinder with Symfony to improve the searchability of Doctrine entities.

Installation

Install PHPhinder via Composer:

composer require eliasfernandez/phphinder-bundle

Usage

Entities

Imagine a book entity we want to optimize to search for. This is how it could look like:

class Book
{
    #[ORM\Id]
    #[ORM\GeneratedValue]
    #[ORM\Column]
    #[PHPhinder\Property(Schema::IS_UNIQUE | Schema::IS_INDEXED | Schema::IS_REQUIRED| Schema::IS_STORED)]
    private ?int $id = null;

    #[ORM\Column(length: 255)]
    #[PHPhinder\Property(Schema::IS_INDEXED | Schema::IS_STORED | Schema::IS_REQUIRED | Schema::IS_FULLTEXT)]
    private ?string $title = null;

    #[ORM\Column(type: Types::SIMPLE_ARRAY)]
    private array $authors = [];

    #[ORM\Column(type: Types::TEXT)]
    #[PHPhinder\Property(Schema::IS_INDEXED)]
    private ?string $description = null;
...
    
    #[PHPhinder\Property(Schema::IS_INDEXED | Schema::IS_REQUIRED, name: 'authors')]
    public function getAuthorsCsv(): string
    {
        return implode(', ', $this->authors);
    }}

Controller

On the controller side we'll need to configure the Search engine to look for Book objects:

    private SearchEngine $searchEngine;

    public function __construct(private StorageFactory $storageFactory, private SchemaGenerator $schemaGenerator)
    {
        $this->searchEngine = new SearchEngine(
            $this->storageFactory->createStorage(
                $this->schemaGenerator->generate(Book::class)
            )
        );
    }

Then, in the actions, we can get results with one single method:

    #[Route('/search', name: 'app_search', methods: ['GET', 'POST'])]
    public function index(Request $request): Response
    {
        $query = $request->query->get('q', '');
        $results = [];

        if ($query) {
            $results = $this->searchEngine->search($query);
        }

        return $this->render('search/index.html.twig', [
            'query' => $query,
            'results' => $results,
        ]);
    }

Configuration

`phphinder.storage` and `phphinder.name`

You can define where the indexes are going to exist by configuring these two variables:

phphinder.storage could be dbal(search entities will live in a database), redis (entities stored in a redis engine) or json (in this case the data is stored in files).
phphinder.name depending on the chosen storage the name will be considered as a folder inside the var folder on the project (in case the storage is json) or as database defined by a DBAL dsn connection string.

`phphinder.auto_sync`

If auto_sync is enabled. Every time a searchable entity is added or updated the search engine will store the neccesary indexes. Attention: It could have a very high impact in systems with a high volume of writing in the database.

`phphinder.sync_in_background`

When sync_in_background is enabled, a message is sent to the queue, and the message handler triggers an event to index the entity in the search engine. This process is nearly immediate, enabling the loading of 100,000 records in approximately 2 minutes. The actual message consumption process takes longer but does not impact entity generation. This can be done in parallel by as many consumers as needed.

Performance

As with any search engine, index creation is significantly slower than generating entities in the database. Indexing involves splitting documents into tokens, transforming these tokens into relevant search strings, aggregating a fuzzy search state index, and (the most time-consuming part) persisting these calculations to storage.

By default, this bundle triggers the indexing mechanism on entity persistence. While convenient, this approach can noticeably affect overall application performance. To mitigate this, indexing logic can be moved to queue processing using the sync_in_background parameter.

Performance also varies based on the storage backend. Let’s evaluate some configurations. Using the phphinder-project, we measure the time required to create the first 3,000 books and then test search performance for the common keyword human. These metrics provide insights into indexing and search speed.

Hardware Specifications

Processor: AMD Ryzen 7 5700G with Radeon Graphics (16 cores)
Memory: 32GB RAM

`auto_sync` with JSON Storage

Indexing Speed

xychart-beta
    title "JSON"
    x-axis "Books Inserted" 100 --> 3000
    y-axis "Time in ms" 500 --> 300000
    line [1485, 2067, 3457, 5188, 8956, 8915, 13680, 13053, 18001, 16520, 24276, 28433, 33869, 39637, 51672, 59923, 62133, 62179, 65565, 61010, 107375, 86390, 67906, 108845, 99148, 131471, 121568, 120736, 256062, 172580]

Conclusion: Very slow.

It is fast for the first entities but starts to degrade with linear growth. The larger the files, the longer it takes to write to them.

Search Speed

Conclusion: Very fast.

It takes only 106ms to load human search results.

`auto_sync` with DBAL (PostgreSQL)

Indexing Speed

xychart-beta
    title "DBAL (PostgreSQL)"
    x-axis "Books Inserted" 100 --> 3000
    y-axis "Time in ms" 500 --> 10000
    line [6844, 5716, 6487, 6955, 8665, 7076, 8576, 7154, 7558, 6448, 7790, 7617, 7835, 7921, 8936, 9055, 8858, 7971, 7661, 6206, 8343, 7033, 6102, 7190, 6491, 7886, 7578, 7092, 8666, 5999]

Conclusion: Slow but does not degrade.

It takes an average of 7.5 seconds to process 100 entities and maintains this performance regardless of the number of entities.

Search Speed

Conclusion: Very fast.

It takes 106ms to load human search results, similar to JSON storage.

`auto_sync` with Redis Storage

Indexing Speed

xychart-beta
    title "Redis"
    x-axis "Books Inserted" 100 --> 3000
    y-axis "Time in ms" 500 --> 2500
    line [1117, 1000, 1115, 1273, 2377, 1331, 1616, 1373, 1483, 1293, 1543, 1520, 1621, 1630, 1821, 1863, 1808, 1690, 1679, 1399, 1808, 1590, 1402, 1583, 1484, 1755, 1694, 1698, 1948, 1494]

Conclusion: Fastest option.

It averages 1.5 seconds to process 100 entities and maintains this speed regardless of the total number of entities.

Search Speed

Conclusion: Slow.

It takes 700ms to load human search results. This could be related to the implementation of the Redis storage or limitations with Redis itself. Performance remains consistent even with 100,000 books.

Final Notes

xychart-beta
    title "In Comparison"
    x-axis "Books Inserted" 100 --> 3000
    y-axis "Time in ms" 500 --> 300000
    line [1485, 2067, 3457, 5188, 8956, 8915, 13680, 13053, 18001, 16520, 24276, 28433, 33869, 39637, 51672, 59923, 62133, 62179, 65565, 61010, 107375, 86390, 67906, 108845, 99148, 131471, 121568, 120736, 256062, 172580]
    line [6844, 5716, 6487, 6955, 8665, 7076, 8576, 7154, 7558, 6448, 7790, 7617, 7835, 7921, 8936, 9055, 8858, 7971, 7661, 6206, 8343, 7033, 6102, 7190, 6491, 7886, 7578, 7092, 8666, 5999]
    line [1117, 1000, 1115, 1273, 2377, 1331, 1616, 1373, 1483, 1293, 1543, 1520, 1621, 1630, 1821, 1863, 1808, 1690, 1679, 1399, 1808, 1590, 1402, 1583, 1484, 1755, 1694, 1698, 1948, 1494]

Looking at the results, JSON is not recommended as a storage option except in specific cases:

Fewer than 5,000 documents need indexing.
The application is static and infrequently updated.
Database usage is restricted or costly.

For write-intensive applications where search speed is less critical, Redis is the best choice. However, for most use cases, DBAL provides a good balance of indexing speed, search performance, and reliability. With sync_in_background, DBAL is fast enough for indexing, very fast for searches, and highly reliable.

Try it out with the PHPhinder project!

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
composer.json		composer.json
composer.lock		composer.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PHPhinder Bundle

What is it?

Installation

Usage

Entities

Controller

Configuration

`phphinder.storage` and `phphinder.name`

`phphinder.auto_sync`

`phphinder.sync_in_background`

Performance

Hardware Specifications

`auto_sync` with JSON Storage

Indexing Speed

Search Speed

`auto_sync` with DBAL (PostgreSQL)

Indexing Speed

Search Speed

`auto_sync` with Redis Storage

Indexing Speed

Search Speed

Final Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

eliasfernandez/phphinder-bundle

Folders and files

Latest commit

History

Repository files navigation

PHPhinder Bundle

What is it?

Installation

Usage

Entities

Controller

Configuration

phphinder.storage and phphinder.name

phphinder.auto_sync

phphinder.sync_in_background

Performance

Hardware Specifications

auto_sync with JSON Storage

Indexing Speed

Search Speed

auto_sync with DBAL (PostgreSQL)

Indexing Speed

Search Speed

auto_sync with Redis Storage

Indexing Speed

Search Speed

Final Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`phphinder.storage` and `phphinder.name`

`phphinder.auto_sync`

`phphinder.sync_in_background`

`auto_sync` with JSON Storage

`auto_sync` with DBAL (PostgreSQL)

`auto_sync` with Redis Storage

Packages