What is Lucene?
Who uses Lucene?
Here are some stack decisions, common use cases and reviews by companies and developers who chose Lucene in their tech stack.
"Slack provides two strategies for searching: Recent and Relevant. Recent search finds the messages that match all terms and presents them in reverse chronological order. If a user is trying to recall something that just happened, Recent is a useful presentation of the results.
Relevant search relaxes the age constraint and takes into account the Lucene score of the document — how well it matches the query terms (Solr powers search at Slack). Used about 17% of the time, Relevant search performed slightly worse than Recent according to the search quality metrics we measured: the number of clicks per search and the click-through rate of the search results in the top several positions. We recognized that Relevant search could benefit from using the user’s interaction history with channels and other users — their ‘work graph’."
- over 150GB/hour on modern hardware
- small RAM requirements -- only 1MB heap
- incremental indexing as fast as batch indexing
- index size roughly 20-30% the size of text indexed
- ranked searching -- best results returned first
- many powerful query types: phrase queries, wildcard queries, proximity queries, range queries
- fielded searching (e.g. title, author, contents)
- sorting by any field
- multiple-index searching with merged results
- allows simultaneous update and searching
- flexible faceting, highlighting, joins and result grouping
- fast, memory-efficient and typo-tolerant suggesters
- pluggable ranking models, including the Vector Space Model and Okapi BM25
- configurable storage engine (codecs)