Hi all,

Currently, we need to ingest the data from Amazon S3 to DB either Amazon Athena or Amazon Redshift. But the problem with the data is, it is in .PSV (pipe separated values) format and the size is also above 200 GB. The query performance of the timeout in Athena/Redshift is not up to the mark, too slow while compared to Google BigQuery. How would I optimize the performance and query result time? Can anyone please help me out?

READ LESS
3 upvotes·513.9K views
Replies (4)
Avatar of alexisblandin9870
Architect at CGI·
Recommends
on
Amazon AthenaAmazon Athena

It depend of the nature of your data (structured or not?) and of course your queries (ad-hoc or predictible?). For example you can look at partitioning and columnar format to maximize MPP capabilities for both Athena and Redshift

2 upvotes·253.6K views

you can use aws glue service to convert you pipe format data to parquet format , and thus you can achieve data compression . Now you should choose Redshift to copy your data as it is very huge. To manage your data, you should partition your data in S3 bucket and also divide your data across the redshift cluster

7 upvotes·213.1K views
View all (4)
Avatar of Alexis Blandin

Alexis Blandin

Architect at CGI