AWS Glue vs CDAP: What are the differences?
What is AWS Glue? Fully managed extract, transform, and load (ETL) service. A fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.
What is CDAP? Open source virtualization platform for Hadoop data and apps. Cask Data Application Platform (CDAP) is an open source application development platform for the Hadoop ecosystem that provides developers with data and application virtualization to accelerate application development, address a broader range of real-time and batch use cases, and deploy applications into production while satisfying enterprise requirements.
AWS Glue and CDAP can be primarily classified as "Big Data" tools.
Some of the features offered by AWS Glue are:
- Easy - AWS Glue automates much of the effort in building, maintaining, and running ETL jobs. AWS Glue crawls your data sources, identifies data formats, and suggests schemas and transformations. AWS Glue automatically generates the code to execute your data transformations and loading processes.
- Integrated - AWS Glue is integrated across a wide range of AWS services.
- Serverless - AWS Glue is serverless. There is no infrastructure to provision or manage. AWS Glue handles provisioning, configuration, and scaling of the resources required to run your ETL jobs on a fully managed, scale-out Apache Spark environment. You pay only for the resources used while your jobs are running.
On the other hand, CDAP provides the following key features:
- Streams for data ingestion
- Reusable libraries for common Big Data access patterns
- Data available to multiple applications and different paradigms
CDAP is an open source tool with 346 GitHub stars and 178 GitHub forks. Here's a link to CDAP's open source repository on GitHub.