Amazon Athena vs Minio

Overview

Minio

Stacks638

Followers670

Votes43

GitHub Stars57.8K

Forks6.4K

Amazon Athena

Stacks521

Followers840

Votes49

Amazon Athena vs Minio: What are the differences?

Introduction:

Amazon Athena and Minio are two different technologies used in data analysis and storage. While both serve similar purposes, there are key differences between them.

1. Scalability and Flexibility:

Amazon Athena is a serverless interactive query service provided by AWS. It allows users to analyze data stored in Amazon S3 using standard SQL queries. Athena automatically scales underlying resources based on the query load and provides flexibility in data analysis. On the other hand, Minio is an open-source object storage system that can be deployed on-premises or in the cloud. It offers high scalability and flexibility in storing unstructured data, making it suitable for a wide range of use cases.

2. Cost Model:

Amazon Athena follows a pay-as-you-go pricing model. Users are charged based on the amount of data scanned during query execution, making it cost-effective for sporadic or smaller workloads. Minio, on the other hand, is essentially free and open-source. It allows organizations to build their storage infrastructure without any licensing costs, making it a more budget-friendly option for long-term data storage.

3. Integration with Ecosystem:

As a part of AWS, Amazon Athena integrates seamlessly with various other AWS services such as Amazon CloudWatch, AWS Glue, and AWS Lake Formation. It allows for easy data ingestion, data cataloging, and monitoring within the AWS ecosystem. In contrast, Minio can integrate with different cloud platforms and storage systems, providing flexibility in building a multi-cloud or hybrid cloud environment.

4. Data Security and Compliance:

Amazon Athena, being a managed service, inherits the robust security measures provided by AWS. It supports encryption at rest and in transit, IAM authentication, and access control policies, ensuring data security and compliance with industry standards. Minio also provides security features like server-side encryption, access control, and secure access policies, making it a secure option for data storage.

5. Query Performance:

Amazon Athena uses Presto as the underlying query engine, which is optimized for interactive querying. It leverages distributed computing to process queries in parallel and provides high-performance data analysis. Minio, being an object storage system, may not provide the same level of query performance as Athena, especially for complex, ad-hoc queries.

Summary:

In summary, Amazon Athena is a scalable serverless query service provided by AWS, offering seamless integration with other AWS services, cost-effective pricing, strong security, and high query performance. On the other hand, Minio is an open-source object storage system with high scalability, flexibility, cost-effectiveness, and compatibility with various cloud platforms.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Minio, Amazon Athena

Pavithra

Mar 12, 2020

Needs adviceon

Amazon S3

Amazon Athena

Amazon Redshift

Hi all,

Currently, we need to ingest the data from Amazon S3 to DB either Amazon Athena or Amazon Redshift. But the problem with the data is, it is in .PSV (pipe separated values) format and the size is also above 200 GB. The query performance of the timeout in Athena/Redshift is not up to the mark, too slow while compared to Google BigQuery. How would I optimize the performance and query result time? Can anyone please help me out?

522k views522k

Comments

Detailed Comparison

Minio	Amazon Athena
Minio is an object storage server compatible with Amazon S3 and licensed under Apache 2.0 License	Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
Statistics
GitHub Stars 57.8K	GitHub Stars -
GitHub Forks 6.4K	GitHub Forks -
Stacks 638	Stacks 521
Followers 670	Followers 840
Votes 43	Votes 49
Pros & Cons
Pros 10 Store and Serve Resumes & Job Description PDF, Backups 8 S3 Compatible 4 Simple 4 Open Source 3 Encryption and Tamper-Proof Cons 3 Deletion of huge buckets is not possible	Pros 16 Use SQL to analyze CSV files 8 Glue crawlers gives easy Data catalogue 7 Cheap 6 Query all my data without running servers 24x7 4 No data base servers yay
Integrations
Amazon S3	Amazon S3 Presto

What are some alternatives to Minio, Amazon Athena?

Amazon S3

Amazon Simple Storage Service provides a fully redundant data storage infrastructure for storing and retrieving any amount of data, at any time, from anywhere on the web

Apache Spark

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Amazon EBS

Amazon EBS volumes are network-attached, and persist independently from the life of an instance. Amazon EBS provides highly available, highly reliable, predictable storage volumes that can be attached to a running Amazon EC2 instance and exposed as a device within the instance. Amazon EBS is particularly suited for applications that require a database, file system, or access to raw block level storage.

Google Cloud Storage

Google Cloud Storage allows world-wide storing and retrieval of any amount of data and at any time. It provides a simple programming interface which enables developers to take advantage of Google's own reliable and fast networking infrastructure to perform data operations in a secure and cost effective manner. If expansion needs arise, developers can benefit from the scalability provided by Google's infrastructure.

Presto

Distributed SQL Query Engine for Big Data

Azure Storage

Azure Storage provides the flexibility to store and retrieve large amounts of unstructured data, such as documents and media files with Azure Blobs; structured nosql based data with Azure Tables; reliable messages with Azure Queues, and use SMB based Azure Files for migrating on-premises applications to the cloud.

OpenEBS

OpenEBS allows you to treat your persistent workload containers, such as DBs on containers, just like other containers. OpenEBS itself is deployed as just another container on your host.

Apache Flink

Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

lakeFS

It is an open-source data version control system for data lakes. It provides a “Git for data” platform enabling you to implement best practices from software engineering on your data lake, including branching and merging, CI/CD, and production-like dev/test environments.

Druid

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.

Related Comparisons