Amazon Athena vs Azure Storage

Overview

Azure Storage

Stacks1.3K

Followers787

Votes52

Amazon Athena

Stacks524

Followers840

Votes49

Amazon Athena vs Azure Storage: What are the differences?

Introduction

In this article, we will discuss the key differences between Amazon Athena and Azure Storage. Both Amazon Athena and Azure Storage are popular cloud-based storage and analytics services. While they offer similar features, they have some distinct differences that make them suitable for different use cases. Let's explore these differences in more detail below.

Data Processing Approach: One of the key differences between Amazon Athena and Azure Storage is their data processing approach. Amazon Athena follows a serverless query engine design that allows users to run interactive SQL queries directly on their data stored in Amazon S3. On the other hand, Azure Storage provides data storage and retrieval capabilities but does not offer a built-in serverless query engine like Athena. Therefore, to process and analyze data stored in Azure Storage, users need to use additional services like Azure Data Lake Analytics or Azure Databricks.
Integration with Ecosystem: Another significant difference is the integration of these services with their respective cloud ecosystems. Amazon Athena is tightly integrated with the Amazon Web Services (AWS) ecosystem, which means it seamlessly works with other AWS services like AWS Glue, AWS Lambda, and Amazon Redshift. This integration allows for a more seamless data pipeline and leverages the capabilities of other AWS services. In contrast, Azure Storage is part of the larger Microsoft Azure ecosystem and offers seamless integration with several Azure services such as Azure Data Factory, Azure Databricks, and Azure Analysis Services.
Pricing Model: Pricing is another aspect where Amazon Athena and Azure Storage differ. Amazon Athena pricing is based on the amount of data scanned by the queries and the cost per terabyte for the storage. Users only pay for the queries they run, making it a cost-effective option for ad-hoc and on-demand analysis. In contrast, Azure Storage pricing is based on the amount of stored data and transactions, such as data writes and reads. This pricing model may be more suitable for scenarios where data storage and retrieval are the primary concerns.
Data Source Support: When it comes to data source support, Amazon Athena has broader support compared to Azure Storage. Athena can directly query data in Amazon S3, AWS Glue Data Catalog, and external tables defined in AWS Glue. It also supports various data formats like Parquet, ORC, JSON, and CSV. Azure Storage, on the other hand, can work with various data types such as blobs, files, queues, and tables, but additional tools and services are required for complex data transformations and analysis.
Managed Service vs. Cloud Storage: Amazon Athena is a fully managed service provided by AWS, meaning that AWS takes care of managing the infrastructure, scalability, and performance of the service. Users can focus on executing queries and analyzing results without worrying about managing underlying servers or clusters. In contrast, Azure Storage primarily provides cloud storage capabilities, and users need to provision and manage their own infrastructure if they want to perform analytics on the data.

In summary, Amazon Athena offers a serverless query engine for data stored in Amazon S3, tightly integrates with the AWS ecosystem, follows a pay-per-query pricing model, and provides broader data source support. On the other hand, Azure Storage is a cloud storage service that requires additional services for data processing and analytics, integrates with the Azure ecosystem, follows a different pricing model based on storage and transactions, and supports various data types.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Azure Storage, Amazon Athena

Kevin

Co-founder at Transloadit

Dec 18, 2020

Review

Hey there, the trick to keeping costs under control is to partition. This means you split up your source files by date, and also query within dates, so that Athena only scans the few files necessary for those dates. I hope that makes sense (and I also hope I understood your question right). This article explains better https://aws.amazon.com/blogs/big-data/analyze-your-amazon-cloudfront-access-logs-at-scale/.

5.11k views5.11k

Comments

Pavithra

Mar 12, 2020

Needs adviceon

Amazon S3

Amazon Athena

Amazon Redshift

Hi all,

Currently, we need to ingest the data from Amazon S3 to DB either Amazon Athena or Amazon Redshift. But the problem with the data is, it is in .PSV (pipe separated values) format and the size is also above 200 GB. The query performance of the timeout in Athena/Redshift is not up to the mark, too slow while compared to Google BigQuery. How would I optimize the performance and query result time? Can anyone please help me out?

522k views522k

Comments

Detailed Comparison

Azure Storage	Amazon Athena
Azure Storage provides the flexibility to store and retrieve large amounts of unstructured data, such as documents and media files with Azure Blobs; structured nosql based data with Azure Tables; reliable messages with Azure Queues, and use SMB based Azure Files for migrating on-premises applications to the cloud.	Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
Blobs, Tables, Queues, and Files;Highly scalable;Durable & highly available;Premium Storage;Designed for developers	-
Statistics
Stacks 1.3K	Stacks 524
Followers 787	Followers 840
Votes 52	Votes 49
Pros & Cons
Pros 24 All-in-one storage solution 15 Pay only for data used regardless of disk size 9 Shared drive mapping 2 Cheapest hot and cloud storage 2 Cost-effective Cons 2 Direct support is not provided by Azure storage	Pros 16 Use SQL to analyze CSV files 8 Glue crawlers gives easy Data catalogue 7 Cheap 6 Query all my data without running servers 24x7 4 No data base servers yay
Integrations
Microsoft Azure	Amazon S3 Presto

What are some alternatives to Azure Storage, Amazon Athena?

Amazon S3

Amazon Simple Storage Service provides a fully redundant data storage infrastructure for storing and retrieving any amount of data, at any time, from anywhere on the web

Apache Spark

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Amazon EBS

Amazon EBS volumes are network-attached, and persist independently from the life of an instance. Amazon EBS provides highly available, highly reliable, predictable storage volumes that can be attached to a running Amazon EC2 instance and exposed as a device within the instance. Amazon EBS is particularly suited for applications that require a database, file system, or access to raw block level storage.

Google Cloud Storage

Google Cloud Storage allows world-wide storing and retrieval of any amount of data and at any time. It provides a simple programming interface which enables developers to take advantage of Google's own reliable and fast networking infrastructure to perform data operations in a secure and cost effective manner. If expansion needs arise, developers can benefit from the scalability provided by Google's infrastructure.

Presto

Distributed SQL Query Engine for Big Data

Minio

Minio is an object storage server compatible with Amazon S3 and licensed under Apache 2.0 License

OpenEBS

OpenEBS allows you to treat your persistent workload containers, such as DBs on containers, just like other containers. OpenEBS itself is deployed as just another container on your host.

Apache Flink

Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

lakeFS

It is an open-source data version control system for data lakes. It provides a “Git for data” platform enabling you to implement best practices from software engineering on your data lake, including branching and merging, CI/CD, and production-like dev/test environments.

Druid

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.

Related Comparisons

Amazon Athena vs Azure Storage: What are the differences?

Introduction

Data Processing Approach: One of the key differences between Amazon Athena and Azure Storage is their data processing approach. Amazon Athena follows a serverless query engine design that allows users to run interactive SQL queries directly on their data stored in Amazon S3. On the other hand, Azure Storage provides data storage and retrieval capabilities but does not offer a built-in serverless query engine like Athena. Therefore, to process and analyze data stored in Azure Storage, users need to use additional services like Azure Data Lake Analytics or Azure Databricks.
Integration with Ecosystem: Another significant difference is the integration of these services with their respective cloud ecosystems. Amazon Athena is tightly integrated with the Amazon Web Services (AWS) ecosystem, which means it seamlessly works with other AWS services like AWS Glue, AWS Lambda, and Amazon Redshift. This integration allows for a more seamless data pipeline and leverages the capabilities of other AWS services. In contrast, Azure Storage is part of the larger Microsoft Azure ecosystem and offers seamless integration with several Azure services such as Azure Data Factory, Azure Databricks, and Azure Analysis Services.
Pricing Model: Pricing is another aspect where Amazon Athena and Azure Storage differ. Amazon Athena pricing is based on the amount of data scanned by the queries and the cost per terabyte for the storage. Users only pay for the queries they run, making it a cost-effective option for ad-hoc and on-demand analysis. In contrast, Azure Storage pricing is based on the amount of stored data and transactions, such as data writes and reads. This pricing model may be more suitable for scenarios where data storage and retrieval are the primary concerns.
Data Source Support: When it comes to data source support, Amazon Athena has broader support compared to Azure Storage. Athena can directly query data in Amazon S3, AWS Glue Data Catalog, and external tables defined in AWS Glue. It also supports various data formats like Parquet, ORC, JSON, and CSV. Azure Storage, on the other hand, can work with various data types such as blobs, files, queues, and tables, but additional tools and services are required for complex data transformations and analysis.
Managed Service vs. Cloud Storage: Amazon Athena is a fully managed service provided by AWS, meaning that AWS takes care of managing the infrastructure, scalability, and performance of the service. Users can focus on executing queries and analyzing results without worrying about managing underlying servers or clusters. In contrast, Azure Storage primarily provides cloud storage capabilities, and users need to provision and manage their own infrastructure if they want to perform analytics on the data.

Amazon Athena vs Azure Storage

Overview

Amazon Athena vs Azure Storage: What are the differences?