Avatar of dkorolev
Principal Maintainer at Current
Shared insights

Ruby NLP C++ Grammar #BNF

At FriendlyData we had a Ruby-based pipeline for natural language processing. Our technology is centered around grammar-based natural language parsing, as well as various product features, and, as the core stack of the company historically is Ruby, the initial version of the pipeline was implemented in Ruby as well.

As we were entering the exponential growth phase, both technology- and product-wise, we looked into how could we speed up and extend the performance and flexibility of our [meta-]BNF-based parsing engine. Gradually, we built the pieces of the engine in C++.

Ultimately, the natural language parsing stack spans three universes and three software engineering paradigms: the declarative one, the functional one, and the imperative one. The imperative one was and remains implemented in Ruby, the functional one is implemented in a functional language (this part is under the NDA, while everything I am talking about here is part of the public talks we gave throughout 2017 and 2018), and the declarative part, which can loosely be thought of as being BNF-based, is now served by the C++ engine.

The C++ engine for the BNF part removed the immediate blockers, gave us 500x+ performance speedup, and enabled us to launch new product features, most notably query completions, suggestions, and spelling corrections.

3 upvotes14.1K views
Avatar of Dima Korolev

Dima Korolev

Principal Maintainer at Current