Avatar of Aditya Tyagi
Needs advice
on
AirflowAirflow
and
AWS LambdaAWS Lambda
in

I have data stored in Amazon S3 bucket in parquet file format.

I want this data to be copied from S3 to Amazon Redshift, so I use copy commands to achieve this. But, I need to do this manually. I want to achieve this with some sort of automation such that if any new file comes into S3, it should be copied to the required table in redshift. Can you suggest what different approaches I can use?

READ MORE
8 upvotes·16.5K views
Replies (1)
Backend Software Engineer ·
Recommends
on
aws
sns
aws-s3

Hello Aditya, I haven't tried this myself, but theoretically you can harness the power of Amazon s3 events to generate an event whenever there is a CRUD event on your parquet files in S3. First you'd have to generate the event: https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html

Then you would want to subscribe to the event use the event info to pull the file into Redshift https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-event-notifications.html#working-with-event-notifications-subscribe

-Scott

READ MORE
4 upvotes·3.7K views

you can use aws glue service to convert you pipe format data to parquet format , and thus you can achieve data compression . Now you should choose Redshift to copy your data as it is very huge. To manage your data, you should partition your data in S3 bucket and also divide your data across the redshift cluster

READ MORE
7 upvotes·208.8K views

you can change your PSV fomat data to parquet file format with AWS GLUE and then your query performance will be improved

READ MORE
1 upvote·208.2K views