Amazon Athena

Amazon Athena

Application and Data / Data Stores / Big Data Tools
Needs advice
on
Ali9t apache awsAli9t apache aws
and
TrinoTrino

Could you please suggest the best database engine for on-premise? We have used Amazon Athena for the cloud. We are looking similar product for on-premise. This should support Node.js programming language.

READ MORE
2 upvotes·1.8K views
Needs advice
on
Amazon AthenaAmazon Athena
and
Amazon DynamoDBAmazon DynamoDB

So, I have data in Amazon S3 as parquet files and I have it available in the Glue data catalog too. I want to build an AppSync API on top of this data. Now the two options that I am considering are:

  1. Bring the data to Amazon DynamoDB and then build my API on top of this Database.

  2. Add a Lambda function that resolves Amazon Athena queries made by AppSync.

Which of the two approaches will be cost effective?

I would really appreciate some back of the envelope estimates too.

Note: I only expect to make read queries. Thanks.

READ MORE
5 upvotes·21.8K views
Replies (2)
Lead Developer at Di-Vision Consultion·
Recommends
on
Amazon DynamoDB

Overall, I would think, if the data fits in AWS DynamoDB with being able to Query (not scan) that would be a bit more cost effective. But it all depends on the size and changes in the data.

On relatively stale data Athena could be cheaper on big loads when the data is processed via Glue, Lambda costs are quite small. DynamoDB could become expensive under big loads of (reads and/or writes)

READ MORE
3 upvotes·2K views
Engineering leader at Cyren·

It really depends on the data size and number of requests. If latency isnt an issue for this use-case so Amazon Athena might be a good solution for that if you partition the data correctly to be effective enough. DynamoDB is a key-value db, it really depends on the use-case - you might not be able to retrieve the relevant info..

READ MORE
2 upvotes·2.7K views
Senior Product Engineer ·

Hey all, I need some suggestions in creating a replica of our RDS DB for reporting and analytical purposes. Cost is a major factor. I was thinking of using AWS Glue to move data from Amazon RDS to Amazon S3 and use Amazon Athena to run queries on it. Any other suggestions would be appreciable.

READ MORE
2 upvotes·60.3K views
Replies (1)
VP Engineering at Onefootball·

If cost is a major factor, I suggest to either A) look at open source tools that you can run on compute you already pay for or B) use AWS services within the free tier.

For option A), check out singer taps and targets. For option B) check out the AWS DBMS (Database Migration Service). It's make for replicating data and your use case is described here https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html

READ MORE
Using Amazon S3 as a target for AWS Database Migration Service - AWS Database Migration Service (docs.aws.amazon.com)
3 upvotes·706 views

Hi all,

Currently, we need to ingest the data from Amazon S3 to DB either Amazon Athena or Amazon Redshift. But the problem with the data is, it is in .PSV (pipe separated values) format and the size is also above 200 GB. The query performance of the timeout in Athena/Redshift is not up to the mark, too slow while compared to Google BigQuery. How would I optimize the performance and query result time? Can anyone please help me out?

READ MORE
3 upvotes·488.1K views
Replies (4)

you can use aws glue service to convert you pipe format data to parquet format , and thus you can achieve data compression . Now you should choose Redshift to copy your data as it is very huge. To manage your data, you should partition your data in S3 bucket and also divide your data across the redshift cluster

READ MORE
7 upvotes·191.8K views
Data Technologies Manager at SDG Group Iberia·
Recommends
on
Amazon Redshift

First of all you should make your choice upon Redshift or Athena based on your use case since they are two very diferent services - Redshift is an enterprise-grade MPP Data Warehouse while Athena is a SQL layer on top of S3 with limited performance. If performance is a key factor, users are going to execute unpredictable queries and direct and managing costs are not a problem I'd definitely go for Redshift. If performance is not so critical and queries will be predictable somewhat I'd go for Athena.

Once you select the technology you'll need to optimize your data in order to get the queries executed as fast as possible. In both cases you may need to adapt the data model to fit your queries better. In the case you go for Athena you'd also proabably need to change your file format to Parquet or Avro and review your partition strategy depending on your most frequent type of query. If you choose Redshift you'll need to ingest the data from your files into it and maybe carry out some tuning tasks for performance gain.

I'll recommend Redshift for now since it can address a wider range of use cases, but we could give you better advice if you described your use case in depth.

READ MORE
5 upvotes·233.1K views
View all (4)