Alternatives to Azure Data Factory logo

Alternatives to Azure Data Factory

Azure Databricks, Talend, AWS Data Pipeline, AWS Glue, and Apache NiFi are the most popular alternatives and competitors to Azure Data Factory.
241
471
+ 1
0

What is Azure Data Factory and what are its top alternatives?

It is a service designed to allow developers to integrate disparate data sources. It is a platform somewhat like SSIS in the cloud to manage the data you have both on-prem and in the cloud.
Azure Data Factory is a tool in the Big Data Tools category of a tech stack.
Azure Data Factory is an open source tool with 469 GitHub stars and 571 GitHub forks. Here’s a link to Azure Data Factory's open source repository on GitHub

Top Alternatives to Azure Data Factory

  • Azure Databricks
    Azure Databricks

    Accelerate big data analytics and artificial intelligence (AI) solutions with Azure Databricks, a fast, easy and collaborative Apache Spark–based analytics service. ...

  • Talend
    Talend

    It is an open source software integration platform helps you in effortlessly turning data into business insights. It uses native code generation that lets you run your data pipelines seamlessly across all cloud providers and get optimized performance on all platforms. ...

  • AWS Data Pipeline
    AWS Data Pipeline

    AWS Data Pipeline is a web service that provides a simple management system for data-driven workflows. Using AWS Data Pipeline, you define a pipeline composed of the “data sources” that contain your data, the “activities” or business logic such as EMR jobs or SQL queries, and the “schedule” on which your business logic executes. For example, you could define a job that, every hour, runs an Amazon Elastic MapReduce (Amazon EMR)–based analysis on that hour’s Amazon Simple Storage Service (Amazon S3) log data, loads the results into a relational database for future lookup, and then automatically sends you a daily summary email. ...

  • AWS Glue
    AWS Glue

    A fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. ...

  • Apache NiFi
    Apache NiFi

    An easy to use, powerful, and reliable system to process and distribute data. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. ...

  • Airflow
    Airflow

    Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed. ...

  • Databricks
    Databricks

    Databricks Unified Analytics Platform, from the original creators of Apache Spark™, unifies data science and engineering across the Machine Learning lifecycle from data preparation to experimentation and deployment of ML applications. ...

  • JavaScript
    JavaScript

    JavaScript is most known as the scripting language for Web pages, but used in many non-browser environments as well such as node.js or Apache CouchDB. It is a prototype-based, multi-paradigm scripting language that is dynamic,and supports object-oriented, imperative, and functional programming styles. ...

Azure Data Factory alternatives & related posts

Azure Databricks logo

Azure Databricks

239
380
0
Fast, easy, and collaborative Apache Spark–based analytics service
239
380
+ 1
0
PROS OF AZURE DATABRICKS
    Be the first to leave a pro
    CONS OF AZURE DATABRICKS
      Be the first to leave a con

      related Azure Databricks posts

      Talend logo

      Talend

      150
      247
      0
      A single, unified suite for all integration needs
      150
      247
      + 1
      0
      PROS OF TALEND
        Be the first to leave a pro
        CONS OF TALEND
          Be the first to leave a con

          related Talend posts

          Shared insights
          on
          TalendTalendSnapLogicSnapLogic

          SnapLogic Vs Talend: Which one to choose when you have a lot of transformation logic to be used huge volume of data load on everyday basis.

          . better monitor & support . better performance . easy coding

          See more
          AWS Data Pipeline logo

          AWS Data Pipeline

          95
          397
          1
          Process and move data between different AWS compute and storage services
          95
          397
          + 1
          1
          PROS OF AWS DATA PIPELINE
          • 1
            Easy to create DAG and execute it
          CONS OF AWS DATA PIPELINE
            Be the first to leave a con

            related AWS Data Pipeline posts

            AWS Glue logo

            AWS Glue

            449
            806
            9
            Fully managed extract, transform, and load (ETL) service
            449
            806
            + 1
            9
            PROS OF AWS GLUE
            • 9
              Managed Hive Metastore
            CONS OF AWS GLUE
              Be the first to leave a con

              related AWS Glue posts

              Will Dataflow be the right replacement for AWS Glue? Are there any unforeseen exceptions like certain proprietary transformations not supported in Google Cloud Dataflow, connectors ecosystem, Data Quality & Date cleansing not supported in DataFlow. etc?

              Also, how about Google Cloud Data Fusion as a replacement? In terms of No Code/Low code .. (Since basic use cases in Glue support UI, in that case, CDF may be the right choice ).

              What would be the best choice?

              See more
              Pardha Saradhi
              Technical Lead at Incred Financial Solutions · | 6 upvotes · 102.1K views

              Hi,

              We are currently storing the data in Amazon S3 using Apache Parquet format. We are using Presto to query the data from S3 and catalog it using AWS Glue catalog. We have Metabase sitting on top of Presto, where our reports are present. Currently, Presto is becoming too costly for us, and we are looking for alternatives for it but want to use the remaining setup (S3, Metabase) as much as possible. Please suggest alternative approaches.

              See more
              Apache NiFi logo

              Apache NiFi

              341
              681
              65
              A reliable system to process and distribute data
              341
              681
              + 1
              65
              PROS OF APACHE NIFI
              • 17
                Visual Data Flows using Directed Acyclic Graphs (DAGs)
              • 8
                Free (Open Source)
              • 7
                Simple-to-use
              • 5
                Scalable horizontally as well as vertically
              • 5
                Reactive with back-pressure
              • 4
                Fast prototyping
              • 3
                Bi-directional channels
              • 3
                End-to-end security between all nodes
              • 2
                Built-in graphical user interface
              • 2
                Can handle messages up to gigabytes in size
              • 2
                Data provenance
              • 1
                Lots of documentation
              • 1
                Hbase support
              • 1
                Support for custom Processor in Java
              • 1
                Hive support
              • 1
                Kudu support
              • 1
                Slack integration
              • 1
                Lot of articles
              CONS OF APACHE NIFI
              • 2
                HA support is not full fledge
              • 2
                Memory-intensive
              • 1
                Kkk

              related Apache NiFi posts

              John Calandra
              Data Manager at The Garrett Group · | 8 upvotes · 359.1K views

              There is a question coming... I am using Oracle VirtualBox to spawn 3 Ubuntu Linux virtual machines (VM). VM1 is being used as a data lake - just a place to store flat files. VM2 hosts Apache NiFi. VM3 hosts PostgreSQL. I have built a NiFi pipeline that reads flat files on VM1 and then pipes the data over to and inserts it into the Postgresql database. I left this setup alone for a while, and then something hiccupped on VM3, and I had to rebuild it. Now I cannot make a remote connection to Postgresql on VM3. I was using pgAdmin3 on VM3, but it kept throwing errors - I found out it went end-of-life in 2018 and uninstalled it. pgAdmin4 is out, but for some reason, I cannot get the APT utility to find/install it. I am trying to figure out the pgAdmin4 install problem and looking for a good alternative for pgAdmin4 that I can use to diagnose the remote database connection problem. Does anyone have any suggestions? Thanks in advance.

              See more

              I am looking for the best tool to orchestrate #ETL workflows in non-Hadoop environments, mainly for regression testing use cases. Would Airflow or Apache NiFi be a good fit for this purpose?

              For example, I want to run an Informatica ETL job and then run an SQL task as a dependency, followed by another task from Jira. What tool is best suited to set up such a pipeline?

              See more
              Airflow logo

              Airflow

              1.7K
              2.7K
              126
              A platform to programmaticaly author, schedule and monitor data pipelines, by Airbnb
              1.7K
              2.7K
              + 1
              126
              PROS OF AIRFLOW
              • 51
                Features
              • 14
                Task Dependency Management
              • 12
                Beautiful UI
              • 12
                Cluster of workers
              • 10
                Extensibility
              • 6
                Open source
              • 5
                Complex workflows
              • 5
                Python
              • 3
                Good api
              • 3
                Apache project
              • 3
                Custom operators
              • 2
                Dashboard
              CONS OF AIRFLOW
              • 2
                Observability is not great when the DAGs exceed 250
              • 2
                Running it on kubernetes cluster relatively complex
              • 2
                Open source - provides minimum or no support
              • 1
                Logical separation of DAGs is not straight forward

              related Airflow posts

              Data science and engineering teams at Lyft maintain several big data pipelines that serve as the foundation for various types of analysis throughout the business.

              Apache Airflow sits at the center of this big data infrastructure, allowing users to “programmatically author, schedule, and monitor data pipelines.” Airflow is an open source tool, and “Lyft is the very first Airflow adopter in production since the project was open sourced around three years ago.”

              There are several key components of the architecture. A web UI allows users to view the status of their queries, along with an audit trail of any modifications the query. A metadata database stores things like job status and task instance status. A multi-process scheduler handles job requests, and triggers the executor to execute those tasks.

              Airflow supports several executors, though Lyft uses CeleryExecutor to scale task execution in production. Airflow is deployed to three Amazon Auto Scaling Groups, with each associated with a celery queue.

              Audit logs supplied to the web UI are powered by the existing Airflow audit logs as well as Flask signal.

              Datadog, Statsd, Grafana, and PagerDuty are all used to monitor the Airflow system.

              See more

              We are a young start-up with 2 developers and a team in India looking to choose our next ETL tool. We have a few processes in Azure Data Factory but are looking to switch to a better platform. We were debating Trifacta and Airflow. Or even staying with Azure Data Factory. The use case will be to feed data to front-end APIs.

              See more
              Databricks logo

              Databricks

              480
              737
              8
              A unified analytics platform, powered by Apache Spark
              480
              737
              + 1
              8
              PROS OF DATABRICKS
              • 1
                Best Performances on large datasets
              • 1
                True lakehouse architecture
              • 1
                Scalability
              • 1
                Databricks doesn't get access to your data
              • 1
                Usage Based Billing
              • 1
                Security
              • 1
                Data stays in your cloud account
              • 1
                Multicloud
              CONS OF DATABRICKS
                Be the first to leave a con

                related Databricks posts

                Jan Vlnas
                Developer Advocate at Superface · | 5 upvotes · 429.8K views

                From my point of view, both OpenRefine and Apache Hive serve completely different purposes. OpenRefine is intended for interactive cleaning of messy data locally. You could work with their libraries to use some of OpenRefine features as part of your data pipeline (there are pointers in FAQ), but OpenRefine in general is intended for a single-user local operation.

                I can't recommend a particular alternative without better understanding of your use case. But if you are looking for an interactive tool to work with big data at scale, take a look at notebook environments like Jupyter, Databricks, or Deepnote. If you are building a data processing pipeline, consider also Apache Spark.

                Edit: Fixed references from Hadoop to Hive, which is actually closer to Spark.

                See more
                Vamshi Krishna
                Data Engineer at Tata Consultancy Services · | 4 upvotes · 245.9K views

                I have to collect different data from multiple sources and store them in a single cloud location. Then perform cleaning and transforming using PySpark, and push the end results to other applications like reporting tools, etc. What would be the best solution? I can only think of Azure Data Factory + Databricks. Are there any alternatives to #AWS services + Databricks?

                See more
                JavaScript logo

                JavaScript

                351.4K
                267.5K
                8.1K
                Lightweight, interpreted, object-oriented language with first-class functions
                351.4K
                267.5K
                + 1
                8.1K
                PROS OF JAVASCRIPT
                • 1.7K
                  Can be used on frontend/backend
                • 1.5K
                  It's everywhere
                • 1.2K
                  Lots of great frameworks
                • 897
                  Fast
                • 745
                  Light weight
                • 425
                  Flexible
                • 392
                  You can't get a device today that doesn't run js
                • 286
                  Non-blocking i/o
                • 237
                  Ubiquitousness
                • 191
                  Expressive
                • 55
                  Extended functionality to web pages
                • 49
                  Relatively easy language
                • 46
                  Executed on the client side
                • 30
                  Relatively fast to the end user
                • 25
                  Pure Javascript
                • 21
                  Functional programming
                • 15
                  Async
                • 13
                  Full-stack
                • 12
                  Setup is easy
                • 12
                  Future Language of The Web
                • 12
                  Its everywhere
                • 11
                  Because I love functions
                • 11
                  JavaScript is the New PHP
                • 10
                  Like it or not, JS is part of the web standard
                • 9
                  Expansive community
                • 9
                  Everyone use it
                • 9
                  Can be used in backend, frontend and DB
                • 9
                  Easy
                • 8
                  Most Popular Language in the World
                • 8
                  Powerful
                • 8
                  Can be used both as frontend and backend as well
                • 8
                  For the good parts
                • 8
                  No need to use PHP
                • 8
                  Easy to hire developers
                • 7
                  Agile, packages simple to use
                • 7
                  Love-hate relationship
                • 7
                  Photoshop has 3 JS runtimes built in
                • 7
                  Evolution of C
                • 7
                  It's fun
                • 7
                  Hard not to use
                • 7
                  Versitile
                • 7
                  Its fun and fast
                • 7
                  Nice
                • 7
                  Popularized Class-Less Architecture & Lambdas
                • 7
                  Supports lambdas and closures
                • 6
                  It let's me use Babel & Typescript
                • 6
                  Can be used on frontend/backend/Mobile/create PRO Ui
                • 6
                  1.6K Can be used on frontend/backend
                • 6
                  Client side JS uses the visitors CPU to save Server Res
                • 6
                  Easy to make something
                • 5
                  Clojurescript
                • 5
                  Promise relationship
                • 5
                  Stockholm Syndrome
                • 5
                  Function expressions are useful for callbacks
                • 5
                  Scope manipulation
                • 5
                  Everywhere
                • 5
                  Client processing
                • 5
                  What to add
                • 4
                  Because it is so simple and lightweight
                • 4
                  Only Programming language on browser
                • 1
                  Test
                • 1
                  Hard to learn
                • 1
                  Test2
                • 1
                  Not the best
                • 1
                  Easy to understand
                • 1
                  Subskill #4
                • 1
                  Easy to learn
                • 0
                  Hard 彤
                CONS OF JAVASCRIPT
                • 22
                  A constant moving target, too much churn
                • 20
                  Horribly inconsistent
                • 15
                  Javascript is the New PHP
                • 9
                  No ability to monitor memory utilitization
                • 8
                  Shows Zero output in case of ANY error
                • 7
                  Thinks strange results are better than errors
                • 6
                  Can be ugly
                • 3
                  No GitHub
                • 2
                  Slow

                related JavaScript posts

                Zach Holman

                Oof. I have truly hated JavaScript for a long time. Like, for over twenty years now. Like, since the Clinton administration. It's always been a nightmare to deal with all of the aspects of that silly language.

                But wowza, things have changed. Tooling is just way, way better. I'm primarily web-oriented, and using React and Apollo together the past few years really opened my eyes to building rich apps. And I deeply apologize for using the phrase rich apps; I don't think I've ever said such Enterprisey words before.

                But yeah, things are different now. I still love Rails, and still use it for a lot of apps I build. But it's that silly rich apps phrase that's the problem. Users have way more comprehensive expectations than they did even five years ago, and the JS community does a good job at building tools and tech that tackle the problems of making heavy, complicated UI and frontend work.

                Obviously there's a lot of things happening here, so just saying "JavaScript isn't terrible" might encompass a huge amount of libraries and frameworks. But if you're like me, yeah, give things another shot- I'm somehow not hating on JavaScript anymore and... gulp... I kinda love it.

                See more
                Conor Myhrvold
                Tech Brand Mgr, Office of CTO at Uber · | 44 upvotes · 10.2M views

                How Uber developed the open source, end-to-end distributed tracing Jaeger , now a CNCF project:

                Distributed tracing is quickly becoming a must-have component in the tools that organizations use to monitor their complex, microservice-based architectures. At Uber, our open source distributed tracing system Jaeger saw large-scale internal adoption throughout 2016, integrated into hundreds of microservices and now recording thousands of traces every second.

                Here is the story of how we got here, from investigating off-the-shelf solutions like Zipkin, to why we switched from pull to push architecture, and how distributed tracing will continue to evolve:

                https://eng.uber.com/distributed-tracing/

                (GitHub Pages : https://www.jaegertracing.io/, GitHub: https://github.com/jaegertracing/jaeger)

                Bindings/Operator: Python Java Node.js Go C++ Kubernetes JavaScript OpenShift C# Apache Spark

                See more