Lucene

137
148
+ 1
0
Sphinx

182
186
+ 1
26
Add tool

Lucene vs Sphinx: What are the differences?

What is Lucene? A high-performance, full-featured text search engine library written entirely in Java. Lucene Core, our flagship sub-project, provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.

What is Sphinx? Open source full text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind. Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with Sphinx pretty much as with a database server. A variety of text processing features enable fine-tuning Sphinx for your particular application requirements, and a number of relevance functions ensures you can tweak search quality as well.

Lucene and Sphinx belong to "Search Engines" category of the tech stack.

Some of the features offered by Lucene are:

  • over 150GB/hour on modern hardware
  • small RAM requirements -- only 1MB heap
  • incremental indexing as fast as batch indexing

On the other hand, Sphinx provides the following key features:

  • Output formats: HTML (including Windows HTML Help), LaTeX (for printable PDF versions), ePub, Texinfo, manual pages, plain text
  • Extensive cross-references: semantic markup and automatic links for functions, classes, citations, glossary terms and similar pieces of information
  • Hierarchical structure: easy definition of a document tree, with automatic links to siblings, parents and children

Grooveshark, Ansible, and Webedia are some of the popular companies that use Sphinx, whereas Lucene is used by Evernote, Twitter, and Slack. Sphinx has a broader approval, being mentioned in 38 company stacks & 13 developers stacks; compared to Lucene, which is listed in 33 company stacks and 9 developer stacks.

Pros of Lucene
Pros of Sphinx
    No pros available

    Sign up to add or upvote prosMake informed product decisions

    Sign up to add or upvote consMake informed product decisions

    What is Lucene?

    Lucene Core, our flagship sub-project, provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.

    What is Sphinx?

    It lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with it pretty much as with a database server.
    What companies use Lucene?
    What companies use Sphinx?

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with Lucene?
    What tools integrate with Sphinx?

    Sign up to get full access to all the tool integrationsMake informed product decisions

    What are some alternatives to Lucene and Sphinx?
    Solr
    Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites.
    Elasticsearch
    Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).
    Apache Solr
    It uses the tools you use to make application building a snap. It is built on the battle-tested Apache Zookeeper, it makes it easy to scale up and down.
    Hadoop
    The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
    MongoDB
    MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.
    See all alternatives
    Interest over time