Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs (GPT-4, ChatGPT, etc.). It offers production-ready tools to build NLP backend services, e.g., question answering or semantic search. | It is a powerful agent-first search engine that enables you to run a webscale search engine locally or to connect via remote API. It's ideal for both Large Language Models (LLMs) and human users. |
Question answering;
Semantic document search;
Latest models;
Vector databases;
Scalable pipelines; | Allows uploading of local data or tailoring of provided datasets to meet specific needs;
Facilitates operation in a completely offline environment;
Offers fully managed access through a dedicated API for seamless integration into various workflows |
Statistics | |
GitHub Stars 23.2K | GitHub Stars 510 |
GitHub Forks 2.5K | GitHub Forks 49 |
Stacks 6 | Stacks 0 |
Followers 12 | Followers 0 |
Votes 0 | Votes 0 |
Integrations | |

It lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with it pretty much as with a database server.

rasa NLU (Natural Language Understanding) is a tool for intent classification and entity extraction. You can think of rasa NLU as a set of high level APIs for building your own language parser using existing NLP and ML libraries.

It is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. It comes with pre-trained statistical models and word vectors, and currently supports tokenization for 49+ languages.

It builds completely static HTML sites that you can host on GitHub pages, Amazon S3, or anywhere else you choose. There's a stack of good looking themes available. The built-in dev-server allows you to preview your documentation as you're writing it. It will even auto-reload and refresh your browser whenever you save your changes.

It can be used to complement any regular touch user interface with a real time voice user interface. It offers real time feedback for faster and more intuitive experience that enables end user to recover from possible errors quickly and with no interruptions.

Lucene Core, our flagship sub-project, provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.

Turn emails, tweets, surveys or any text into actionable data. Automate business workflows and saveExtract and classify information from text. Integrate with your App within minutes. Get started for free.

It is geared towards building search systems for any kind of data, including text, images, audio, video and many more. With the modular design & multi-layer abstraction, you can leverage the efficient patterns to build the system by parts, or chaining them into a Flow for an end-to-end experience.

That transforms AI-generated content into natural, undetectable human-like writing. Bypass AI detection systems with intelligent text humanization technology

It is a framework built around LLMs. It can be used for chatbots, generative question-answering, summarization, and much more. The core idea of the library is that we can “chain” together different components to create more advanced use cases around LLMs.