Amazon Athena vs Druid: What are the differences?
What is Amazon Athena? Query S3 Using SQL. Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
What is Druid? Fast column-oriented distributed data store. Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.
Amazon Athena and Druid can be primarily classified as "Big Data" tools.
"Use SQL to analyze CSV files" is the top reason why over 9 developers like Amazon Athena, while over 3 developers mention "Real Time Aggregations" as the leading cause for choosing Druid.
Druid is an open source tool with 8.31K GitHub stars and 2.08K GitHub forks. Here's a link to Druid's open source repository on GitHub.
According to the StackShare community, Amazon Athena has a broader approval, being mentioned in 50 company stacks & 18 developers stacks; compared to Druid, which is listed in 24 company stacks and 12 developer stacks.
Currently, we need to ingest the data from Amazon S3 to DB either Amazon Athena or Amazon Redshift. But the problem with the data is, it is in .PSV (pipe separated values) format and the size is also above 200 GB. The query performance of the timeout in Athena/Redshift is not up to the mark, too slow while compared to Google BigQuery. How would I optimize the performance and query result time? Can anyone please help me out?