What is AWS Glue?
Who uses AWS Glue?
AWS Glue Integrations
Here are some stack decisions, common use cases and reviews by companies and developers who chose AWS Glue in their tech stack.
I use AWS Glue because I thought it was worth all they hype Fall 2018. However, you had to use Python 2.7 with no pandas support, and cold starts lasted as long as 15 minutes. Also, setting up a dev environment for iterative development was near impossible at the time.
It was a terrible experience for me. I recommend using Amazon EMR instead. Even talking with a friend that works at Amazon, they use EMR instead of Glue for internal spark workloads. Just because a company makes something doesn't mean they use that something :/
AWS Glue's Features
- Easy - AWS Glue automates much of the effort in building, maintaining, and running ETL jobs. AWS Glue crawls your data sources, identifies data formats, and suggests schemas and transformations. AWS Glue automatically generates the code to execute your data transformations and loading processes.
- Integrated - AWS Glue is integrated across a wide range of AWS services.
- Serverless - AWS Glue is serverless. There is no infrastructure to provision or manage. AWS Glue handles provisioning, configuration, and scaling of the resources required to run your ETL jobs on a fully managed, scale-out Apache Spark environment. You pay only for the resources used while your jobs are running.
- Developer Friendly - AWS Glue generates ETL code that is customizable, reusable, and portable, using familiar technology - Scala, Python, and Apache Spark. You can also import custom readers, writers and transformations into your Glue ETL code. Since the code AWS Glue generates is based on open frameworks, there is no lock-in. You can use it anywhere.