Amazon Athena vs Amazon RDS for PostgreSQL

Overview

Amazon RDS for PostgreSQL

Stacks815

Followers607

Votes40

Amazon Athena

Stacks524

Followers840

Votes49

Amazon Athena vs Amazon RDS for PostgreSQL: What are the differences?

Key Differences between Amazon Athena and Amazon RDS for PostgreSQL

Amazon Athena and Amazon RDS for PostgreSQL are both popular services offered by Amazon Web Services (AWS) for data processing and storage. However, there are several key differences between these two services:

Querying Approach:
- Amazon Athena is a serverless service that allows you to run queries directly on your data stored in Amazon S3 without the need to manage any infrastructure. It uses Presto, which is an open-source distributed SQL query engine, to execute complex queries on large datasets.
- On the other hand, Amazon RDS for PostgreSQL is a managed relational database service that provides you with a fully managed PostgreSQL database instance. It offers traditional SQL-based querying capabilities using the PostgreSQL engine.
Database Management:
- Amazon Athena does not require any database management as it is designed to directly query data in Amazon S3. It only provides a querying layer on top of your existing data without any need for data loading or management tasks.
- In contrast, Amazon RDS for PostgreSQL requires you to manage database administration tasks, such as provisioning, configuring, and monitoring the PostgreSQL instance. It provides you with features like automated backups, software patching, and database scaling options.
Scalability and Performance:
- Amazon Athena is highly scalable and can handle large volumes of data for querying since it runs queries directly on data stored in Amazon S3. It automatically scales resources based on the query complexity and dataset size.
- Amazon RDS for PostgreSQL allows you to scale vertically by increasing the compute and storage capacity of your PostgreSQL instance. It also offers read replicas to improve read performance for read-heavy workloads.
Data Format and Storage:
- Amazon Athena supports querying and analyzing data in various file formats like CSV, JSON, Parquet, ORC, and more that are stored in Amazon S3. It allows you to query data directly from the source without any ETL process.
- In Amazon RDS for PostgreSQL, the data is typically stored in a structured manner within the PostgreSQL database instance. It is suitable for applications that require structured data storage and relational capabilities.
Cost Model:
- Amazon Athena follows a pay-per-query pricing model where you only pay for the amount of data scanned by your queries. You are not required to provision or pay for any fixed infrastructure when using Athena.
- Amazon RDS for PostgreSQL follows a different pricing model based on the instance size, storage capacity, and additional features utilized. You need to provision and pay for the PostgreSQL instance even if it is idle.
Use Cases:
- Amazon Athena is well-suited for ad-hoc querying, interactive analytics, and data exploration scenarios. It allows you to quickly gain insights from large datasets stored in Amazon S3 without the need for upfront preparations or data loading.
- On the other hand, Amazon RDS for PostgreSQL is better suited for applications that require a traditional, managed relational database with advanced SQL querying capabilities, transaction support, and ACID compliance.

In summary, Amazon Athena is a serverless query service for analyzing data in Amazon S3 without the need for infrastructure management, while Amazon RDS for PostgreSQL is a managed PostgreSQL database service that requires traditional database management tasks and offers advanced SQL capabilities.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Amazon RDS for PostgreSQL, Amazon Athena

Lonnie

CEO - Co-founder US, Mexico Binational Tech Start-up Accelerator, Incubator at Framework Science

May 9, 2019

Reviewon

Amazon DynamoDB

Amazon RDS for PostgreSQL

We use Amazon RDS for PostgreSQL because RDS and Amazon DynamoDB are two distinct database systems. DynamoDB is NoSQL DB whereas RDS is a relational database on the cloud. The pricing will mainly differ in the type of application you are using and your requirements. For some applications, both DynamoDB and RDS, can serve well, for some it might not. I do not think DynamoDB is cheaper. Right now we are helping Companies in Silicon Valley and in Southern California go SERVERLESS - drastically lowering costs if you are interested in hearing how we go about it.

9.19k views9.19k

Comments

Jorge

Jan 15, 2020

Needs advice

Considering moving part of our PostgreSQL database infrastructure to the cloud, however, not quite sure between AWS, Heroku, Azure and Google cloud. Things to consider: The main reason is for backing up and centralize all our data in the cloud. With that in mind the main elements are: -Pricing for storage. -Small team. -No need for high throughput. -Support for docker swarm and Kubernetes.

51.8k views51.8k

Comments

Pavithra

Mar 12, 2020

Needs adviceon

Amazon S3

Amazon Athena

Amazon Redshift

Hi all,

Currently, we need to ingest the data from Amazon S3 to DB either Amazon Athena or Amazon Redshift. But the problem with the data is, it is in .PSV (pipe separated values) format and the size is also above 200 GB. The query performance of the timeout in Athena/Redshift is not up to the mark, too slow while compared to Google BigQuery. How would I optimize the performance and query result time? Can anyone please help me out?

522k views522k

Comments

Detailed Comparison

Amazon RDS for PostgreSQL	Amazon Athena
Amazon RDS manages complex and time-consuming administrative tasks such as PostgreSQL software installation and upgrades, storage management, replication for high availability and back-ups for disaster recovery. With just a few clicks in the AWS Management Console, you can deploy a PostgreSQL database with automatically configured database parameters for optimal performance. Amazon RDS for PostgreSQL database instances can be provisioned with either standard storage or Provisioned IOPS storage. Once provisioned, you can scale from 10GB to 3TB of storage and from 1,000 IOPS to 30,000 IOPS.	Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
Monitoring and Metrics –Amazon RDS provides Amazon CloudWatch metrics for you DB Instance deployments at no additional charge.;DB Event Notifications –Amazon RDS provides Amazon SNS notifications via email or SMS for your DB Instance deployments.;Automatic Software Patching – Amazon RDS will make sure that the PostgreSQL software powering your deployment stays up-to-date with the latest patches.;Automated Backups – Turned on by default, the automated backup feature of Amazon RDS enables point-in-time recovery for your DB Instance.;DB Snapshots – DB Snapshots are user-initiated backups of your DB Instance.;Pre-configured Parameters – Amazon RDS for PostgreSQL deployments are pre-configured with a sensible set of parameters and settings appropriate for the DB Instance class you have selected.;PostGIS;Language Extensions :PL/Perl, PL/pgSQL, PL/Tcl;Full Text Search Dictionaries;Advanced Data Types : HStore, JSON;Core PostgreSQL engine features	-
Statistics
Stacks 815	Stacks 524
Followers 607	Followers 840
Votes 40	Votes 49
Pros & Cons
Pros 25 Easy setup, backup, monitoring 13 Geospatial support 2 Master-master replication using Multi-AZ instance	Pros 16 Use SQL to analyze CSV files 8 Glue crawlers gives easy Data catalogue 7 Cheap 6 Query all my data without running servers 24x7 4 No data base servers yay
Integrations
No integrations available	Amazon S3 Presto

What are some alternatives to Amazon RDS for PostgreSQL, Amazon Athena?

Apache Spark

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Presto

Distributed SQL Query Engine for Big Data

Heroku Postgres

Heroku Postgres provides a SQL database-as-a-service that lets you focus on building your application instead of messing around with database management.

Apache Flink

Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

lakeFS

It is an open-source data version control system for data lakes. It provides a “Git for data” platform enabling you to implement best practices from software engineering on your data lake, including branching and merging, CI/CD, and production-like dev/test environments.

Druid

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.

Apache Kylin

Apache Kylin™ is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop/Spark supporting extremely large datasets, originally contributed from eBay Inc.

Splunk

It provides the leading platform for Operational Intelligence. Customers use it to search, monitor, analyze and visualize machine data.

Apache Impala

Impala is a modern, open source, MPP SQL query engine for Apache Hadoop. Impala is shipped by Cloudera, MapR, and Amazon. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time.

Vertica

It provides a best-in-class, unified analytics platform that will forever be independent from underlying infrastructure.

Related Comparisons

Amazon Athena vs Amazon RDS for PostgreSQL: What are the differences?

Key Differences between Amazon Athena and Amazon RDS for PostgreSQL

Querying Approach:
- Amazon Athena is a serverless service that allows you to run queries directly on your data stored in Amazon S3 without the need to manage any infrastructure. It uses Presto, which is an open-source distributed SQL query engine, to execute complex queries on large datasets.
- On the other hand, Amazon RDS for PostgreSQL is a managed relational database service that provides you with a fully managed PostgreSQL database instance. It offers traditional SQL-based querying capabilities using the PostgreSQL engine.
Database Management:
- Amazon Athena does not require any database management as it is designed to directly query data in Amazon S3. It only provides a querying layer on top of your existing data without any need for data loading or management tasks.
- In contrast, Amazon RDS for PostgreSQL requires you to manage database administration tasks, such as provisioning, configuring, and monitoring the PostgreSQL instance. It provides you with features like automated backups, software patching, and database scaling options.
Scalability and Performance:
- Amazon Athena is highly scalable and can handle large volumes of data for querying since it runs queries directly on data stored in Amazon S3. It automatically scales resources based on the query complexity and dataset size.
- Amazon RDS for PostgreSQL allows you to scale vertically by increasing the compute and storage capacity of your PostgreSQL instance. It also offers read replicas to improve read performance for read-heavy workloads.
Data Format and Storage:
- Amazon Athena supports querying and analyzing data in various file formats like CSV, JSON, Parquet, ORC, and more that are stored in Amazon S3. It allows you to query data directly from the source without any ETL process.
- In Amazon RDS for PostgreSQL, the data is typically stored in a structured manner within the PostgreSQL database instance. It is suitable for applications that require structured data storage and relational capabilities.
Cost Model:
- Amazon Athena follows a pay-per-query pricing model where you only pay for the amount of data scanned by your queries. You are not required to provision or pay for any fixed infrastructure when using Athena.
- Amazon RDS for PostgreSQL follows a different pricing model based on the instance size, storage capacity, and additional features utilized. You need to provision and pay for the PostgreSQL instance even if it is idle.
Use Cases:
- Amazon Athena is well-suited for ad-hoc querying, interactive analytics, and data exploration scenarios. It allows you to quickly gain insights from large datasets stored in Amazon S3 without the need for upfront preparations or data loading.
- On the other hand, Amazon RDS for PostgreSQL is better suited for applications that require a traditional, managed relational database with advanced SQL querying capabilities, transaction support, and ACID compliance.

Amazon Athena vs Amazon RDS for PostgreSQL

Overview

Amazon Athena vs Amazon RDS for PostgreSQL: What are the differences?