Need advice about which tool to choose?Ask the StackShare community!
CDAP vs Azure Data Factory: What are the differences?
Developers describe CDAP as "Open source virtualization platform for Hadoop data and apps". Cask Data Application Platform (CDAP) is an open source application development platform for the Hadoop ecosystem that provides developers with data and application virtualization to accelerate application development, address a broader range of real-time and batch use cases, and deploy applications into production while satisfying enterprise requirements. On the other hand, Azure Data Factory is detailed as "Create, Schedule, & Manage Data Pipelines". It is a service designed to allow developers to integrate disparate data sources. It is a platform somewhat like SSIS in the cloud to manage the data you have both on-prem and in the cloud.
CDAP and Azure Data Factory can be primarily classified as "Big Data" tools.
Some of the features offered by CDAP are:
- Streams for data ingestion
- Reusable libraries for common Big Data access patterns
- Data available to multiple applications and different paradigms
On the other hand, Azure Data Factory provides the following key features:
- Real-Time Integration
- Parallel Processing
- Data Chunker
CDAP and Azure Data Factory are both open source tools. It seems that CDAP with 368 GitHub stars and 195 forks on GitHub has more adoption than Azure Data Factory with 150 GitHub stars and 255 GitHub forks.
I have to collect different data from multiple sources and store them in a single cloud location. Then perform cleaning and transforming using PySpark, and push the end results to other applications like reporting tools, etc. What would be the best solution? I can only think of Azure Data Factory + Databricks. Are there any alternatives to #AWS services + Databricks?