AWS Data Pipeline vs Elasticsearch

Need advice about which tool to choose?Ask the StackShare community!

AWS Data Pipeline

91
368
+ 1
1
Elasticsearch

29.3K
22.3K
+ 1
1.6K
Add tool

AWS Data Pipeline vs Elasticsearch: What are the differences?

What is AWS Data Pipeline? Process and move data between different AWS compute and storage services. AWS Data Pipeline is a web service that provides a simple management system for data-driven workflows. Using AWS Data Pipeline, you define a pipeline composed of the “data sources” that contain your data, the “activities” or business logic such as EMR jobs or SQL queries, and the “schedule” on which your business logic executes. For example, you could define a job that, every hour, runs an Amazon Elastic MapReduce (Amazon EMR)–based analysis on that hour’s Amazon Simple Storage Service (Amazon S3) log data, loads the results into a relational database for future lookup, and then automatically sends you a daily summary email.

What is Elasticsearch? Open Source, Distributed, RESTful Search Engine. Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).

AWS Data Pipeline can be classified as a tool in the "Data Transfer" category, while Elasticsearch is grouped under "Search as a Service".

Some of the features offered by AWS Data Pipeline are:

  • You can find (and use) a variety of popular AWS Data Pipeline tasks in the AWS Management Console’s template section.
  • Hourly analysis of Amazon S3‐based log data
  • Daily replication of AmazonDynamoDB data to Amazon S3

On the other hand, Elasticsearch provides the following key features:

  • Distributed and Highly Available Search Engine.
  • Multi Tenant with Multi Types.
  • Various set of APIs including RESTful

Elasticsearch is an open source tool with 42.4K GitHub stars and 14.2K GitHub forks. Here's a link to Elasticsearch's open source repository on GitHub.

Advice on AWS Data Pipeline and Elasticsearch
Rana Usman Shahid
Chief Technology Officer at TechAvanza · | 5 upvotes · 226.9K views
Needs advice
on
FirebaseFirebaseElasticsearchElasticsearch
and
AlgoliaAlgolia

Hey everybody! (1) I am developing an android application. I have data of around 3 million record (less than a TB). I want to save that data in the cloud. Which company provides the best cloud database services that would suit my scenario? It should be secured, long term useable, and provide better services. I decided to use Firebase Realtime database. Should I stick with Firebase or are there any other companies that provide a better service?

(2) I have the functionality of searching data in my app. Same data (less than a TB). Which search solution should I use in this case? I found Elasticsearch and Algolia search. It should be secure and fast. If any other company provides better services than these, please feel free to suggest them.

Thank you!

See more
Replies (2)
Josh Dzielak
Co-Founder & CTO at Orbit · | 8 upvotes · 172K views
Recommends
AlgoliaAlgolia

Hi Rana, good question! From my Firebase experience, 3 million records is not too big at all, as long as the cost is within reason for you. With Firebase you will be able to access the data from anywhere, including an android app, and implement fine-grained security with JSON rules. The real-time-ness works perfectly. As a fully managed database, Firebase really takes care of everything. The only thing to watch out for is if you need complex query patterns - Firestore (also in the Firebase family) can be a better fit there.

To answer question 2: the right answer will depend on what's most important to you. Algolia is like Firebase is that it is fully-managed, very easy to set up, and has great SDKs for Android. Algolia is really a full-stack search solution in this case, and it is easy to connect with your Firebase data. Bear in mind that Algolia does cost money, so you'll want to make sure the cost is okay for you, but you will save a lot of engineering time and never have to worry about scale. The search-as-you-type performance with Algolia is flawless, as that is a primary aspect of its design. Elasticsearch can store tons of data and has all the flexibility, is hosted for cheap by many cloud services, and has many users. If you haven't done a lot with search before, the learning curve is higher than Algolia for getting the results ranked properly, and there is another learning curve if you want to do the DevOps part yourself. Both are very good platforms for search, Algolia shines when buliding your app is the most important and you don't want to spend many engineering hours, Elasticsearch shines when you have a lot of data and don't mind learning how to run and optimize it.

See more
Mike Endale
Recommends
Cloud FirestoreCloud Firestore

Rana - we use Cloud Firestore at our startup. It handles many million records without any issues. It provides you the same set of features that the Firebase Realtime Database provides on top of the indexing and security trims. The only thing to watch out for is to make sure your Cloud Functions have proper exception handling and there are no infinite loop in the code. This will be too costly if not caught quickly.

For search; Algolia is a great option, but cost is a real consideration. Indexing large number of records can be cost prohibitive for most projects. Elasticsearch is a solid alternative, but requires a little additional work to configure and maintain if you want to self-host.

Hope this helps.

See more
Get Advice from developers at your company using StackShare Enterprise. Sign up for StackShare Enterprise.
Learn More
Pros of AWS Data Pipeline
Pros of Elasticsearch
  • 1
    Easy to create DAG and execute it
  • 322
    Powerful api
  • 314
    Great search engine
  • 230
    Open source
  • 214
    Restful
  • 199
    Near real-time search
  • 96
    Free
  • 83
    Search everything
  • 54
    Easy to get started
  • 45
    Analytics
  • 26
    Distributed
  • 6
    Fast search
  • 5
    More than a search engine
  • 3
    Easy to scale
  • 3
    Awesome, great tool
  • 3
    Great docs
  • 2
    Potato
  • 2
    Document Store
  • 2
    Great customer support
  • 2
    Intuitive API
  • 2
    Reliable
  • 2
    Nosql DB
  • 2
    Fast
  • 2
    Easy setup
  • 2
    Highly Available
  • 2
    Great piece of software
  • 1
    Ecosystem
  • 1
    Scalability
  • 1
    Not stable
  • 1
    Github
  • 1
    Elaticsearch
  • 1
    Actively developing
  • 1
    Responsive maintainers on GitHub
  • 1
    Easy to get hot data
  • 1
    Open
  • 0
    Community

Sign up to add or upvote prosMake informed product decisions

Cons of AWS Data Pipeline
Cons of Elasticsearch
    Be the first to leave a con
    • 7
      Resource hungry
    • 6
      Diffecult to get started
    • 5
      Expensive
    • 4
      Hard to keep stable at large scale

    Sign up to add or upvote consMake informed product decisions

    What is AWS Data Pipeline?

    AWS Data Pipeline is a web service that provides a simple management system for data-driven workflows. Using AWS Data Pipeline, you define a pipeline composed of the “data sources” that contain your data, the “activities” or business logic such as EMR jobs or SQL queries, and the “schedule” on which your business logic executes. For example, you could define a job that, every hour, runs an Amazon Elastic MapReduce (Amazon EMR)–based analysis on that hour’s Amazon Simple Storage Service (Amazon S3) log data, loads the results into a relational database for future lookup, and then automatically sends you a daily summary email.

    What is Elasticsearch?

    Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).

    Need advice about which tool to choose?Ask the StackShare community!

    What companies use AWS Data Pipeline?
    What companies use Elasticsearch?
    See which teams inside your own company are using AWS Data Pipeline or Elasticsearch.
    Sign up for StackShare EnterpriseLearn More

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with AWS Data Pipeline?
    What tools integrate with Elasticsearch?

    Sign up to get full access to all the tool integrationsMake informed product decisions

    Blog Posts

    May 21 2019 at 12:20AM

    Elastic

    ElasticsearchKibanaLogstash+4
    12
    3835
    GitHubPythonReact+42
    48
    39830
    GitHubPythonNode.js+47
    53
    70441
    What are some alternatives to AWS Data Pipeline and Elasticsearch?
    AWS Glue
    A fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.
    Airflow
    Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed.
    AWS Step Functions
    AWS Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly.
    Apache NiFi
    An easy to use, powerful, and reliable system to process and distribute data. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
    AWS Batch
    It enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. It dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted.
    See all alternatives