Elasticsearch vs Solr: What are the differences?
Developers describe Elasticsearch as "Open Source, Distributed, RESTful Search Engine". Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack). On the other hand, Solr is detailed as "An open source enterprise search server based on Lucene search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication etc". Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites.
Elasticsearch and Solr are primarily classified as "Search as a Service" and "Search Engines" tools respectively.
Some of the features offered by Elasticsearch are:
- Distributed and Highly Available Search Engine.
- Multi Tenant with Multi Types.
- Various set of APIs including RESTful
On the other hand, Solr provides the following key features:
- Advanced Full-Text Search Capabilities
- Optimized for High Volume Web Traffic
- Standards Based Open Interfaces - XML, JSON and HTTP
"Powerful api" is the primary reason why developers consider Elasticsearch over the competitors, whereas "Powerful" was stated as the key factor in picking Solr.
Elasticsearch is an open source tool with 42.4K GitHub stars and 14.2K GitHub forks. Here's a link to Elasticsearch's open source repository on GitHub.
Uber Technologies, Instacart, and Slack are some of the popular companies that use Elasticsearch, whereas Solr is used by Slack, Coursera, and Zalando. Elasticsearch has a broader approval, being mentioned in 2003 company stacks & 979 developers stacks; compared to Solr, which is listed in 140 company stacks and 42 developer stacks.
What is Elasticsearch?
What is Solr?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using Solr?
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
"Slack provides two strategies for searching: Recent and Relevant. Recent search finds the messages that match all terms and presents them in reverse chronological order. If a user is trying to recall something that just happened, Recent is a useful presentation of the results.
Relevant search relaxes the age constraint and takes into account the Lucene score of the document — how well it matches the query terms (Solr powers search at Slack). Used about 17% of the time, Relevant search performed slightly worse than Recent according to the search quality metrics we measured: the number of clicks per search and the click-through rate of the search results in the top several positions. We recognized that Relevant search could benefit from using the user’s interaction history with channels and other users — their ‘work graph’."
Although we were using Elasticsearch in the beginning to power our in-app search, we moved this part of our processing over to Algolia a couple of months ago; this has proven to be a fantastic choice, letting us build search-related features with more confidence and speed.
Elasticsearch is only used for searching in internal tooling nowadays; hosting and running it reliably has been a task that took up too much time for us in the past and fine-tuning the results to reach a great user-experience was also never an easy task for us. With Algolia we can flexibly change ranking methods on the fly and can instead focus our time on fine-tuning the experience within our app.
Memcached is used in front of most of the API endpoints to cache responses in order to speed up response times and reduce server-costs on our side.
Back in 2014, I was given an opportunity to re-architect SmartZip Analytics platform, and flagship product: SmartTargeting. This is a SaaS software helping real estate professionals keeping up with their prospects and leads in a given neighborhood/territory, finding out (thanks to predictive analytics) who's the most likely to list/sell their home, and running cross-channel marketing automation against them: direct mail, online ads, email... The company also does provide Data APIs to Enterprise customers.
I had inherited years and years of technical debt and I knew things had to change radically. The first enabler to this was to make use of the cloud and go with AWS, so we would stop re-inventing the wheel, and build around managed/scalable services.
For the SaaS product, we kept on working with Rails as this was what my team had the most knowledge in. We've however broken up the monolith and decoupled the front-end application from the backend thanks to the use of Rails API so we'd get independently scalable micro-services from now on.
Our various applications could now be deployed using AWS Elastic Beanstalk so we wouldn't waste any more efforts writing time-consuming Capistrano deployment scripts for instance. Combined with Docker so our application would run within its own container, independently from the underlying host configuration.
Storage-wise, we went with Amazon S3 and ditched any pre-existing local or network storage people used to deal with in our legacy systems. On the database side: Amazon RDS / MySQL initially. Ultimately migrated to Amazon RDS for Aurora / MySQL when it got released. Once again, here you need a managed service your cloud provider handles for you.
Future improvements / technology decisions included:
Caching: Amazon ElastiCache / Memcached CDN: Amazon CloudFront Systems Integration: Segment / Zapier Data-warehousing: Amazon Redshift BI: Amazon Quicksight / Superset Search: Elasticsearch / Amazon Elasticsearch Service / Algolia Monitoring: New Relic
As our usage grows, patterns changed, and/or our business needs evolved, my role as Engineering Manager then Director of Engineering was also to ensure my team kept on learning and innovating, while delivering on business value.
One of these innovations was to get ourselves into Serverless : Adopting AWS Lambda was a big step forward. At the time, only available for Node.js (Not Ruby ) but a great way to handle cost efficiency, unpredictable traffic, sudden bursts of traffic... Ultimately you want the whole chain of services involved in a call to be serverless, and that's when we've started leveraging Amazon DynamoDB on these projects so they'd be fully scalable.
I'm planning to create a web application and also a mobile application to provide a very good shopping experience to the end customers. Shortly, my application will be aggregate the product details from difference sources and giving a clear picture to the user that when and where to buy that product with best in Quality and cost.
I have planned to develop this in many milestones for adding N number of features and I have picked my first part to complete the core part (aggregate the product details from different sources).
As per my work experience and knowledge, I have chosen the followings stacks to this mission.
Service: I have planned to use Java as the main business layer language as I have 7+ years of experience on this I believe I can do better work using Java than other languages. In addition, I'm thinking to use the stacks Node.js.
Database and ORM: I'm gonna pick MySQL as DB and Hibernate as ORM since I have a piece of good knowledge and also work experience on this combination.
Search Engine: I need to deal with a large amount of product data and it's in-detailed info to provide enough details to end user at the same time I need to focus on the performance area too. so I have decided to use Solr as a search engine for product search and suggestions. In addition, I'm thinking to replace Solr by Elasticsearch once explored/reviewed enough about Elasticsearch.
Host: As of now, my plan to complete the application with decent features first and deploy it in a free hosting environment like Docker and Heroku and then once it is stable then I have planned to use the AWS products Amazon S3, EC2, Amazon RDS and Amazon Route 53. I'm not sure about Microsoft Azure that what is the specialty in it than Heroku and Amazon EC2 Container Service. Anyhow, I will do explore these once again and pick the best suite one for my requirement once I reached this level.
Build and Repositories: I have decided to choose Apache Maven and Git as these are my favorites and also so popular on respectively build and repositories.
Additional Utilities :) - I would like to choose Codacy for code review as their Startup plan will be very helpful to this application. I'm already experienced with Google CheckStyle and SonarQube even I'm looking something on Codacy.
Happy Coding! Suggestions are welcome! :)
Elasticsearch is the engine that powers search on the site. From a high level perspective, it’s a Lucene wrapper that exposes Lucene’s features via a RESTful API. It handles the distribution of data and simplifies scaling, among other things.
Given that we are on AWS, we use an AWS cloud plugin for Elasticsearch that makes it easy to work in the cloud. It allows us to add nodes without much hassle. It will take care of figuring out if a new node has joined the cluster, and, if so, Elasticsearch will proceed to move data to that new node. It works the same way when a node goes down. It will remove that node based on the AWS cluster configuration.
The very first version of the search was just a Postgres database query. It wasn’t terribly efficient, and then at some point, we moved over to ElasticSearch, and then since then, Andrew just did a lot of work with it, so ElasticSearch is amazing, but out of the box, it doesn’t come configured with all the nice things that are there, but you spend a lot of time figuring out how to put it all together to add stemming, auto suggestions, all kinds of different things, like even spelling adjustments and tomato/tomatoes, that would return different results, so Andrew did a ton of work to make it really, really nice and build a very simple Ruby gem called SearchKick.
We use ElasticSearch for
- Session Logs
We originally self managed the ElasticSearch clusters, but due to our small ops team size we opt to move things to managed AWS services where possible.
The managed servers, however, do not allow us to manage our own backups and a restore actually requires us to open a support ticket with them. We ended up setting up our own nightly backup since we do per day indexes for the logs/analytics.
Elasticsearch has good tooling and supports a large api that makes it ideal for denormalizing data. It has a simple to use aggregations api that tends to encompass most of what I need a BI tool to do, especially in the early going (when paired with Kibana). It's also handy when you just want to search some text.
Self taught : acquired knowledge or skill on one's own initiative. Open Source Search & Analytics. -time search and analytics engine. Search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.
elastic search 와 함께 유명한 검색 엔진 오픈 소스 중 하나 이다. 처음 설정할 것이 많은데, 어플리케이션의 이해가 없다면 잦은 수정이 필요하다. Solr Client 로 제어 할 수 없고 Server 에서 설정해 줘야하는 것들이 있어 서버 설정하는 부분이 중요하다. 서버 설정만 잘 되있다면, Client 쪽 소스는 별게 없다.
중요한 건 형태소 분석기....
Full text search is provided by a SOLR cluster. This is done on Master/Slave replication with Varnish as a cache.