Amazon Athena vs Apache Drill

Overview

Apache Drill

Stacks74

Followers171

Votes16

Amazon Athena

Stacks524

Followers840

Votes49

Amazon Athena vs Apache Drill: What are the differences?

Introduction

Amazon Athena and Apache Drill are two powerful query engines that allow users to process data on a distributed file system without the need for traditional data loading or transformation processes. While both aim to provide fast and cost-effective query processing capabilities, there are some key differences between the two.

Data Source Support: Amazon Athena is primarily designed for querying data stored in Amazon S3, making it a good choice for users who store their data in the Amazon Web Services (AWS) ecosystem. On the other hand, Apache Drill supports a wide range of data sources including Hadoop Distributed File System (HDFS), NoSQL databases, traditional SQL databases, and cloud storage platforms like S3 and Azure.
Data Localization: In Amazon Athena, the data must be localized (partitioned) before querying, which can improve performance but requires additional steps. Apache Drill, on the other hand, does not require data localization and can directly query raw data without any pre-processing.
Query Language: Amazon Athena uses a version of SQL, specifically Presto SQL, to query data. It provides ANSI SQL compatibility and supports a wide range of SQL functions. Apache Drill, on the other hand, supports SQL as well as NoSQL query languages like JSON and MongoDB query syntax, making it more flexible for querying different types of data.
Dependency on Infrastructure: Amazon Athena is a managed service provided by AWS, which means users do not need to worry about deploying and managing the infrastructure. Apache Drill, on the other hand, requires users to set up and manage the cluster infrastructure themselves, giving them more control but also requiring more effort.
Performance Optimization: While both Amazon Athena and Apache Drill provide query optimization techniques, Apache Drill offers more advanced optimization features like query planning, execution, and pushdown optimization. This makes Apache Drill more suitable for complex queries and large datasets that require sophisticated optimization strategies.
Ecosystem Integration: Amazon Athena integrates seamlessly with other AWS services like AWS Glue, which can automate the data cataloging and schema inference processes. It also supports integration with AWS QuickSight for data visualization. Apache Drill, being an open-source project, can be integrated with various tools and frameworks in the Hadoop ecosystem, providing more flexibility in terms of ecosystem integration.

In summary, Amazon Athena and Apache Drill are both powerful query engines, but they differ in terms of data source support, data localization requirements, query language capabilities, dependency on infrastructure, performance optimization capabilities, and ecosystem integration. The choice between the two depends on specific requirements and the existing infrastructure ecosystem.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Apache Drill, Amazon Athena

Kevin

Co-founder at Transloadit

Dec 18, 2020

Review

Hey there, the trick to keeping costs under control is to partition. This means you split up your source files by date, and also query within dates, so that Athena only scans the few files necessary for those dates. I hope that makes sense (and I also hope I understood your question right). This article explains better https://aws.amazon.com/blogs/big-data/analyze-your-amazon-cloudfront-access-logs-at-scale/.

5.11k views5.11k

Comments

Pavithra

Mar 12, 2020

Needs adviceon

Amazon S3

Amazon Athena

Amazon Redshift

Hi all,

Currently, we need to ingest the data from Amazon S3 to DB either Amazon Athena or Amazon Redshift. But the problem with the data is, it is in .PSV (pipe separated values) format and the size is also above 200 GB. The query performance of the timeout in Athena/Redshift is not up to the mark, too slow while compared to Google BigQuery. How would I optimize the performance and query result time? Can anyone please help me out?

522k views522k

Comments

Detailed Comparison

Apache Drill	Amazon Athena
Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel.	Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
Low-latency SQL queries;Dynamic queries on self-describing data in files (such as JSON, Parquet, text) and MapR-DB/HBase tables, without requiring metadata definitions in the Hive metastore.;ANSI SQL;Nested data support;Integration with Apache Hive (queries on Hive tables and views, support for all Hive file formats and Hive UDFs);BI/SQL tool integration using standard JDBC/ODBC drivers	-
Statistics
Stacks 74	Stacks 524
Followers 171	Followers 840
Votes 16	Votes 49
Pros & Cons
Pros 4 NoSQL and Hadoop 3 Lightning speed and simplicity in face of data jungle 3 Free 2 Well documented for fast install 1 Read Structured and unstructured data	Pros 16 Use SQL to analyze CSV files 8 Glue crawlers gives easy Data catalogue 7 Cheap 6 Query all my data without running servers 24x7 4 No data base servers yay
Integrations
No integrations available	Amazon S3 Presto

What are some alternatives to Apache Drill, Amazon Athena?

dbForge Studio for MySQL

It is the universal MySQL and MariaDB client for database management, administration and development. With the help of this intelligent MySQL client the work with data and code has become easier and more convenient. This tool provides utilities to compare, synchronize, and backup MySQL databases with scheduling, and gives possibility to analyze and report MySQL tables data.

dbForge Studio for Oracle

It is a powerful integrated development environment (IDE) which helps Oracle SQL developers to increase PL/SQL coding speed, provides versatile data editing tools for managing in-database and external data.

dbForge Studio for PostgreSQL

It is a GUI tool for database development and management. The IDE for PostgreSQL allows users to create, develop, and execute queries, edit and adjust the code to their requirements in a convenient and user-friendly interface.

dbForge Studio for SQL Server

It is a powerful IDE for SQL Server management, administration, development, data reporting and analysis. The tool will help SQL developers to manage databases, version-control database changes in popular source control systems, speed up routine tasks, as well, as to make complex database changes.

Apache Spark

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Liquibase

Liquibase is th leading open-source tool for database schema change management. Liquibase helps teams track, version, and deploy database schema and logic changes so they can automate their database code process with their app code process.

Sequel Pro

Sequel Pro is a fast, easy-to-use Mac database management application for working with MySQL databases.

DBeaver

It is a free multi-platform database tool for developers, SQL programmers, database administrators and analysts. Supports all popular databases: MySQL, PostgreSQL, SQLite, Oracle, DB2, SQL Server, Sybase, Teradata, MongoDB, Cassandra, Redis, etc.

Presto

Distributed SQL Query Engine for Big Data

dbForge SQL Complete

It is an IntelliSense add-in for SQL Server Management Studio, designed to provide the fastest T-SQL query typing ever possible.

Related Comparisons

Amazon Athena vs Apache Drill: What are the differences?

Introduction

Data Source Support: Amazon Athena is primarily designed for querying data stored in Amazon S3, making it a good choice for users who store their data in the Amazon Web Services (AWS) ecosystem. On the other hand, Apache Drill supports a wide range of data sources including Hadoop Distributed File System (HDFS), NoSQL databases, traditional SQL databases, and cloud storage platforms like S3 and Azure.
Data Localization: In Amazon Athena, the data must be localized (partitioned) before querying, which can improve performance but requires additional steps. Apache Drill, on the other hand, does not require data localization and can directly query raw data without any pre-processing.
Query Language: Amazon Athena uses a version of SQL, specifically Presto SQL, to query data. It provides ANSI SQL compatibility and supports a wide range of SQL functions. Apache Drill, on the other hand, supports SQL as well as NoSQL query languages like JSON and MongoDB query syntax, making it more flexible for querying different types of data.
Dependency on Infrastructure: Amazon Athena is a managed service provided by AWS, which means users do not need to worry about deploying and managing the infrastructure. Apache Drill, on the other hand, requires users to set up and manage the cluster infrastructure themselves, giving them more control but also requiring more effort.
Performance Optimization: While both Amazon Athena and Apache Drill provide query optimization techniques, Apache Drill offers more advanced optimization features like query planning, execution, and pushdown optimization. This makes Apache Drill more suitable for complex queries and large datasets that require sophisticated optimization strategies.
Ecosystem Integration: Amazon Athena integrates seamlessly with other AWS services like AWS Glue, which can automate the data cataloging and schema inference processes. It also supports integration with AWS QuickSight for data visualization. Apache Drill, being an open-source project, can be integrated with various tools and frameworks in the Hadoop ecosystem, providing more flexibility in terms of ecosystem integration.

Amazon Athena vs Apache Drill

Overview

Amazon Athena vs Apache Drill: What are the differences?