Migrating symfony.com Search Engine to Meilisearch

Searching Symfony documentation is one of the most important features of
symfony.com. Search is so pervasive for our visitors, that all pages include a
keyboard shortcut to open the search input: Ctrl. + K (or Command+ K).
We also support pressing Shift key twice, similar to the search feature of some IDEs.

Since this website was launched, we’ve used different products and services to
solve the search problem: Apache Solr, Elasticsearch, Google Custom Search Engine,
and Algolia Search. This week we unveiled the new search experience based on
Meilisearch.

Migration Reasoning

The first main reason to migrate is that we needed a better control of the parsing
and indexing of contents. Using a fully hosted service is convenient, but the
configuration via JSON files was a bit cumbersome and doesn’t give us the
full flexibility that we needed.

The second main reason is that the quality of the search results wasn’t as high
as our readers expected. We received complaints about confusing search results and
about completely missing results. Because of the lack of control over the parsing
and indexing mentioned earlier, we couldn’t improve these results.

Migrating to Meilisearch

We started looking for alternative search engines. Our preference was for open
source projects and developed in Go (because we have experience in that language).
Although there are many projects that match those requirements, many of them
lacked active development, or a big enough community or missed small features
that are essential for us, like support for synonyms (more on that later).

After broadening our requirements, we looked at one project that met most of our
preferences: Meilisearch. It’s a startup created in 2018 that has taken the
search engine market by storm. Meilisearch provides an open source search engine
developed in Rust and also a hosted search solution.

Using Meilisearch has been a delight so far. It all starts with its quick and
simple installation process, it follows with its great and thorough documentation
and it ends with a powerful search engine with mind-blowing performance.
Meilisearch feels fun, fresh and uncomplicated.

Technical Integration

Meilisearch provides integrations with lots of programming languages and frameworks.
There is a Meilisearch Symfony bundle, but we don’t use it because, as some
folks on Symfony Slack advised to us, it’s mainly focused on indexing Doctrine
entities/documents.

Symfony documentation is built with the symfony-tools/docs-builder, which
parses RST documents and outpus JSON files with a certain structure (title, body,
TOC, pagination, etc.) That’s why we used instead the Meilisearch PHP integration
with the Symfony HttpClient component.

Parsing Symfony Documentation

In Meilisearch, Documents are a core concept that refers to each of the items
stored in an index. Each Document contains one or more fields, each of them
consisting of a key-value pair of arbitrary information. For Symfony Docs, this
is what a Meilisearch Document looks like:

namespace AppSearchDto;

class DocumentDto
{
private int $level = -1;
private string $title = ;
private string $content = ;
private string $version = ;
private string $url = ;

public static function createFromSymfonyDocument(…): self
{
// …
}
}

For each Symfony Docs version, we find the JSON files created by the doc-builder
tool and parse their contents to create the Meilisearch Document. First, we remove
some elements from the generated HTML docs to improve results (such as „version added“
directives and most code blocks).

Then, we split the entire doc page by section: each <h1>, <h2>, <h3>,
etc. creates a new standalone Document. For example, consider the following
simplified doc page:

<div class=„section“>
<h1>First title</h1>
<p>Some content</p>

<div class=„section“>
<h2>Second title</h2>
<p>More content</p>
</div>

<div class=„section“>
<h2>Third title</h2>
<p>Final content</p>
</div>
</div>

This Symfony page generates three Meilisearch Documents:

Document{ level: 1, title: ‚First title‘, content: ‚<p>Some content</p>‘ }
Document{ level: 2, title: ‚Second title‘, content: ‚<p>More content</p>‘ }
Document{ level: 2, title: ‚Third title‘, content: ‚<p>Final content</p>‘ }

We have to do this because many Symfony Docs pages are created as „reference pages“
that explain all the main things you need to know about some feature. That’s why
some of them are very long (e.g. configuration.rst has  1,200 lines and
routing.rst has  2,700 lines).

Splitting the entire page contents into multiple small documents is what produces
better search results, closely related to the query terms.

Indexing Documents

Indexing the documents is the simplest part of the process. It’s roughly like this:

use MeilisearchClient;

$client = new Client($meilisearchEdnpoint, $meilisearchMasterKey);
// if the index does not exist, it’s created when indexing the first document
$index = $client->index(’sfdocs‘);

$documents = […];
$index->addDocuments($documents);

However, in order to produce better results, there are a couple of tweaks that
your should do: stopwords and synonyms.

Stopwords are the words that are filtered out in the search index because
they are insignificant. You might think of only configuring the usual stopwords
(e.g. for English: a, an, and, the, etc.) but you probably
need to add tens of stopwords according to the language of your contents:

// check https://sites.google.com/site/kevinbouge/stopwords-lists
$index->updateStopWords([‚a‘, ‚ain’t‘, ‚am‘, ‚an‘, ‚any‘, ‚are‘, ‚as‘, ‚…‘]);

Synonyms are essential to create better search results for contents like
Symfony Docs. Meilisearch uses „prefix search“ by default: searching for config
finds any word with that prefix too (configs, configuration, configurations,
configuring, etc.) but that’s not enough for us.

We also need to consider that yaml and yml are the same; cli, console
and terminal are the same, etc. Thanks to Symfony Slack folks, we crafted
a long list of synonyms:

$index->updateSynonyms([
‚dotenv‘ => [‚.env‘],
// …
‚env var‘ => [‚envvar‘, ‚environment variable‘, ‚environment variables‘],
‚dependency injection‘ => [‚di‘, ‚dic‘, ‚dependencyinjection‘, ’service container‘],
// …
’shell‘ => [‚bash‘, ’sh‘],
]);

In case you are wondering, yes, we also added a synonym for symphony and
symfony.

Regarding the reindexing of contents, the trick is to generate stable id
for the Meilisearch documents. The most important key-value pair in a document is
called id, which is treated as the „primary key“ of the document.

If two documents in the same index have the same id, then they are treated
as the same document and the preceding document will be overwritten. So, when we
update the Symfony Docs contents, we only have to index the documents like the
first time and Meilisearch will create or update the index documents as needed.

In our case, each doc page section has a stable and unique identifier in their
absolute URL including the fragment (e.g. https://symfony.com/doc/current/configuration.html#using-php-configbuilders).

class DocumentDto
{
// …

// this array is what’s passed to the Meilisearch PHP API
public static function getAsArray(): array
{
return [
‚id‘ => hash(‚xxh3‘, $this->url),
‚level‘ => $this->level,
// …
];
}
}

The xxh3 is a new kind of hasher, available since PHP 8.1, which provides
excellent randomness and it’s an order of magnitude faster than MD5/SHA1.
Consider using it when you don’t need hashes for cryptographic purposes.

Search Engine Interface

The last step to complete the migration was to update the web interface of the
search engine. Before, we had a lot of JavaScript code to handle the search
autocomplete, the Ajax queries, the rendering of results, etc.

In the new search engine interface we only left the JavaScript code that handles
the search input modal and the keyboard navigation of the results. All the other
JavaScript code was removed and replaced by some PHP code.

We’ve always wanted to use more Symfony UX components in symfony.com, so this was
the best opportunity to try Symfony UX Live Components. Take a look at our
search component:

namespace AppComponent;

use AppSearchDtoSearchResultDto;
use AppSearchSearchEngine;
use SymfonyUXLiveComponentAttributeAsLiveComponent;
use SymfonyUXLiveComponentAttributeLiveProp;
use SymfonyUXLiveComponentDefaultActionTrait;

#[AsLiveComponent(’search‘)]
class SearchComponent
{
use DefaultActionTrait;

#[LiveProp(writable: true)]
public ?string $query = null;

public function __construct(private SearchEngine $searchEngine)
{
}

public function getResult(): SearchResultDto
{
return $this->searchEngine->search($this->query);
}
}

And the related Twig template that shows the search input and renders the results:

<div {{ attributes }}>
<form id=„form-search“ data-model=„*“>
<label for=„form-search-input“ class=„sr-only visually-hidden“>Search Symfony Docs</label>
<input type=„search“ name=„query“ value=
{{ this.query }}
placeholder=„Search Symfony Docs“ aria-label=„Search in Symfony documentation“
autocapitalize=„off“ autocomplete=„off“ autocorrect=„off“ spellcheck=„false“>

</form>

<div class=„search-results-wrapper“ style={{ this.result.isEmptyQuery ? ‚display: none‘ }}>
<div class=„search-results“>
{% if this.result.isError %}
<p class=„error“></p>
{% else %}
{% if 0 == this.result.totalHits %}
<p class=„no-results“>No results. Try making your query more generic!</p>
{% else %}
{% for result in this.result.hits %}
<div class=„search-result“>
<!– … –>
</div>
{% endfor %}
{% endif %}
{% endif %}
</div>
</div>
</div>

The PHP file and Twig template shown above is all you need to make the search
autocomplete with Ajax HTTP queries. You don’t need to write a single line of JavaScript
to have a beautiful and modern JavaScript interface.

Read the Symfony UX Live Components docs to learn everything about this
essential Symfony feature.

Evaluating the Result

At this point, we had a search engine for Symfony Docs that was simple to build
and operate. But how was the result in terms of search results quality and performance?

Regarding performance: in total, including all Symfony Docs versions and the
bundles documentation, we parse 12,754 files and create 85,146 Meilisearch documents.
The total time spent to parse, create and index all those documents, is 17 seconds.
The search index takes about 1.2 GB of disk space and most queries take less
than 10 milliseconds.

Regarding the search result quality, we made an exhaustive comparison between
the previous search engine and the new Meilisearch engine. We tested usual queries,
weird queries, long queries, queries with typos, etc. Here’s a selection of the
results obtained (left: before, right: after).

Conclusion and Future Developments

The new Meilisearch-based search engine is already deployed to symfony.com for
you to try it. Thanks to Meilisearch for their help during this migration process
and for kindly providing the hosting of our index data.

Meilisearch provides all the features that we need (indexing arbitrary information,
sorting, result weighting, stopwords, synonyms, facets and filtering, etc.) and many
features that we don’t need yet (like geosearch).

However, it has some known limitations that you should check before trying to
use it in your projects. If your project requirements are complex (e.g. cross-faceting
between multiple indexes, etc.) you should also check if those advanced features
are already supported. Check the Meilisearch public roadmap too.

Finally, we want to add more features to the new search engine so you can filter
results by doc type and version. We’ll do that in the coming weeks, as long as
other changes and improvements suggested by the community.

Tell us what you think about this via Twitter, Slack or in the comments below.

Sponsor the Symfony project.

Symfony Blog

Read More

Generated by Feedzy