Alternatives to Matillion logo

Alternatives to Matillion

Talend, Alooma, AWS Glue, Stitch, and Airflow are the most popular alternatives and competitors to Matillion.
36
54
+ 1
0

What is Matillion and what are its top alternatives?

It is a modern, browser-based UI, with powerful, push-down ETL/ELT functionality. With a fast setup, you are up and running in minutes.
Matillion is a tool in the Big Data as a Service category of a tech stack.

Top Alternatives to Matillion

  • Talend
    Talend

    It is an open source software integration platform helps you in effortlessly turning data into business insights. It uses native code generation that lets you run your data pipelines seamlessly across all cloud providers and get optimized performance on all platforms. ...

  • Alooma
    Alooma

    Get the power of big data in minutes with Alooma and Amazon Redshift. Simply build your pipelines and map your events using Alooma’s friendly mapping interface. Query, analyze, visualize, and predict now. ...

  • AWS Glue
    AWS Glue

    A fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. ...

  • Stitch
    Stitch

    Stitch is a simple, powerful ETL service built for software developers. Stitch evolved out of RJMetrics, a widely used business intelligence platform. When RJMetrics was acquired by Magento in 2016, Stitch was launched as its own company. ...

  • Airflow
    Airflow

    Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed. ...

  • dbt
    dbt

    dbt is a transformation workflow that lets teams deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation. Now anyone who knows SQL can build production-grade data pipelines. ...

  • Amazon Redshift
    Amazon Redshift

    It is optimized for data sets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions. ...

  • Google BigQuery
    Google BigQuery

    Run super-fast, SQL-like queries against terabytes of data in seconds, using the processing power of Google's infrastructure. Load data with ease. Bulk load your data using Google Cloud Storage or stream it in. Easy access. Access BigQuery by using a browser tool, a command-line tool, or by making calls to the BigQuery REST API with client libraries such as Java, PHP or Python. ...

Matillion alternatives & related posts

Talend logo

Talend

130
206
0
A single, unified suite for all integration needs
130
206
+ 1
0
PROS OF TALEND
    Be the first to leave a pro
    CONS OF TALEND
      Be the first to leave a con

      related Talend posts

      Alooma logo

      Alooma

      23
      45
      0
      Integrate any data source like databases, applications, and any API - with your own Amazon Redshift
      23
      45
      + 1
      0
      PROS OF ALOOMA
        Be the first to leave a pro
        CONS OF ALOOMA
          Be the first to leave a con

          related Alooma posts

          AWS Glue logo

          AWS Glue

          356
          633
          8
          Fully managed extract, transform, and load (ETL) service
          356
          633
          + 1
          8
          PROS OF AWS GLUE
          • 8
            Managed Hive Metastore
          CONS OF AWS GLUE
            Be the first to leave a con

            related AWS Glue posts

            Pardha Saradhi
            Technical Lead at Incred Financial Solutions · | 6 upvotes · 43.9K views

            Hi,

            We are currently storing the data in Amazon S3 using Apache Parquet format. We are using Presto to query the data from S3 and catalog it using AWS Glue catalog. We have Metabase sitting on top of Presto, where our reports are present. Currently, Presto is becoming too costly for us, and we are looking for alternatives for it but want to use the remaining setup (S3, Metabase) as much as possible. Please suggest alternative approaches.

            See more
            Punith Ganadinni
            Senior Product Engineer · | 2 upvotes · 35.6K views

            Hey all, I need some suggestions in creating a replica of our RDS DB for reporting and analytical purposes. Cost is a major factor. I was thinking of using AWS Glue to move data from Amazon RDS to Amazon S3 and use Amazon Athena to run queries on it. Any other suggestions would be appreciable.

            See more
            Stitch logo

            Stitch

            137
            141
            11
            All your data. In your data warehouse. In minutes.
            137
            141
            + 1
            11
            PROS OF STITCH
            • 7
              3 minutes to set up
            • 4
              Super simple, great support
            CONS OF STITCH
              Be the first to leave a con

              related Stitch posts

              Ankit Sobti

              Looker , Stitch , Amazon Redshift , dbt

              We recently moved our Data Analytics and Business Intelligence tooling to Looker . It's already helping us create a solid process for reusable SQL-based data modeling, with consistent definitions across the entire organizations. Looker allows us to collaboratively build these version-controlled models and push the limits of what we've traditionally been able to accomplish with analytics with a lean team.

              For Data Engineering, we're in the process of moving from maintaining our own ETL pipelines on AWS to a managed ELT system on Stitch. We're also evaluating the command line tool, dbt to manage data transformations. Our hope is that Stitch + dbt will streamline the ELT bit, allowing us to focus our energies on analyzing data, rather than managing it.

              See more
              Airflow logo

              Airflow

              1.4K
              2.3K
              123
              A platform to programmaticaly author, schedule and monitor data pipelines, by Airbnb
              1.4K
              2.3K
              + 1
              123
              PROS OF AIRFLOW
              • 49
                Features
              • 14
                Task Dependency Management
              • 12
                Beautiful UI
              • 12
                Cluster of workers
              • 10
                Extensibility
              • 5
                Open source
              • 5
                Python
              • 4
                Complex workflows
              • 3
                K
              • 3
                Good api
              • 2
                Custom operators
              • 2
                Apache project
              • 2
                Dashboard
              CONS OF AIRFLOW
              • 2
                Running it on kubernetes cluster relatively complex
              • 2
                Open source - provides minimum or no support
              • 1
                Logical separation of DAGs is not straight forward
              • 1
                Observability is not great when the DAGs exceed 250

              related Airflow posts

              Shared insights
              on
              JenkinsJenkinsAirflowAirflow

              I am looking for an open-source scheduler tool with cross-functional application dependencies. Some of the tasks I am looking to schedule are as follows:

              1. Trigger Matillion ETL loads
              2. Trigger Attunity Replication tasks that have downstream ETL loads
              3. Trigger Golden gate Replication Tasks
              4. Shell scripts, wrappers, file watchers
              5. Event-driven schedules

              I have used Airflow in the past, and I know we need to create DAGs for each pipeline. I am not familiar with Jenkins, but I know it works with configuration without much underlying code. I want to evaluate both and appreciate any advise

              See more
              Shared insights
              on
              AWS Step FunctionsAWS Step FunctionsAirflowAirflow

              I am working on a project that grabs a set of input data from AWS S3, pre-processes and divvies it up, spins up 10K batch containers to process the divvied data in parallel on AWS Batch, post-aggregates the data, and pushes it to S3.

              I already have software patterns from other projects for Airflow + Batch but have not dealt with the scaling factors of 10k parallel tasks. Airflow is nice since I can look at which tasks failed and retry a task after debugging. But dealing with that many tasks on one Airflow EC2 instance seems like a barrier. Another option would be to have one task that kicks off the 10k containers and monitors it from there.

              I have no experience with AWS Step Functions but have heard it's AWS's Airflow. There looks to be plenty of patterns online for Step Functions + Batch. Do Step Functions seem like a good path to check out for my use case? Do you get the same insights on failing jobs / ability to retry tasks as you do with Airflow?

              See more
              dbt logo

              dbt

              275
              278
              6
              dbt helps data teams work like software engineers—to ship trusted data, faster.
              275
              278
              + 1
              6
              PROS OF DBT
              • 1
                Easy for SQL programmers to learn
              • 1
                Modularity, portability, CI/CD, and documentation
              • 1
                Faster Integrated Testing
              • 1
                Reusable Macro
              • 1
                Schedule Jobs
              • 1
                CI/CD
              CONS OF DBT
                Be the first to leave a con

                related dbt posts

                Ankit Sobti

                Looker , Stitch , Amazon Redshift , dbt

                We recently moved our Data Analytics and Business Intelligence tooling to Looker . It's already helping us create a solid process for reusable SQL-based data modeling, with consistent definitions across the entire organizations. Looker allows us to collaboratively build these version-controlled models and push the limits of what we've traditionally been able to accomplish with analytics with a lean team.

                For Data Engineering, we're in the process of moving from maintaining our own ETL pipelines on AWS to a managed ELT system on Stitch. We're also evaluating the command line tool, dbt to manage data transformations. Our hope is that Stitch + dbt will streamline the ELT bit, allowing us to focus our energies on analyzing data, rather than managing it.

                See more
                Amazon Redshift logo

                Amazon Redshift

                1.4K
                1.2K
                104
                Fast, fully managed, petabyte-scale data warehouse service
                1.4K
                1.2K
                + 1
                104
                PROS OF AMAZON REDSHIFT
                • 37
                  Data Warehousing
                • 27
                  Scalable
                • 17
                  SQL
                • 14
                  Backed by Amazon
                • 5
                  Encryption
                • 1
                  Cheap and reliable
                • 1
                  Isolation
                • 1
                  Best Cloud DW Performance
                • 1
                  Fast columnar storage
                CONS OF AMAZON REDSHIFT
                  Be the first to leave a con

                  related Amazon Redshift posts

                  Julien DeFrance
                  Principal Software Engineer at Tophatter · | 16 upvotes · 2.5M views

                  Back in 2014, I was given an opportunity to re-architect SmartZip Analytics platform, and flagship product: SmartTargeting. This is a SaaS software helping real estate professionals keeping up with their prospects and leads in a given neighborhood/territory, finding out (thanks to predictive analytics) who's the most likely to list/sell their home, and running cross-channel marketing automation against them: direct mail, online ads, email... The company also does provide Data APIs to Enterprise customers.

                  I had inherited years and years of technical debt and I knew things had to change radically. The first enabler to this was to make use of the cloud and go with AWS, so we would stop re-inventing the wheel, and build around managed/scalable services.

                  For the SaaS product, we kept on working with Rails as this was what my team had the most knowledge in. We've however broken up the monolith and decoupled the front-end application from the backend thanks to the use of Rails API so we'd get independently scalable micro-services from now on.

                  Our various applications could now be deployed using AWS Elastic Beanstalk so we wouldn't waste any more efforts writing time-consuming Capistrano deployment scripts for instance. Combined with Docker so our application would run within its own container, independently from the underlying host configuration.

                  Storage-wise, we went with Amazon S3 and ditched any pre-existing local or network storage people used to deal with in our legacy systems. On the database side: Amazon RDS / MySQL initially. Ultimately migrated to Amazon RDS for Aurora / MySQL when it got released. Once again, here you need a managed service your cloud provider handles for you.

                  Future improvements / technology decisions included:

                  Caching: Amazon ElastiCache / Memcached CDN: Amazon CloudFront Systems Integration: Segment / Zapier Data-warehousing: Amazon Redshift BI: Amazon Quicksight / Superset Search: Elasticsearch / Amazon Elasticsearch Service / Algolia Monitoring: New Relic

                  As our usage grows, patterns changed, and/or our business needs evolved, my role as Engineering Manager then Director of Engineering was also to ensure my team kept on learning and innovating, while delivering on business value.

                  One of these innovations was to get ourselves into Serverless : Adopting AWS Lambda was a big step forward. At the time, only available for Node.js (Not Ruby ) but a great way to handle cost efficiency, unpredictable traffic, sudden bursts of traffic... Ultimately you want the whole chain of services involved in a call to be serverless, and that's when we've started leveraging Amazon DynamoDB on these projects so they'd be fully scalable.

                  See more
                  Ankit Sobti

                  Looker , Stitch , Amazon Redshift , dbt

                  We recently moved our Data Analytics and Business Intelligence tooling to Looker . It's already helping us create a solid process for reusable SQL-based data modeling, with consistent definitions across the entire organizations. Looker allows us to collaboratively build these version-controlled models and push the limits of what we've traditionally been able to accomplish with analytics with a lean team.

                  For Data Engineering, we're in the process of moving from maintaining our own ETL pipelines on AWS to a managed ELT system on Stitch. We're also evaluating the command line tool, dbt to manage data transformations. Our hope is that Stitch + dbt will streamline the ELT bit, allowing us to focus our energies on analyzing data, rather than managing it.

                  See more
                  Google BigQuery logo

                  Google BigQuery

                  1.4K
                  1.2K
                  146
                  Analyze terabytes of data in seconds
                  1.4K
                  1.2K
                  + 1
                  146
                  PROS OF GOOGLE BIGQUERY
                  • 27
                    High Performance
                  • 24
                    Easy to use
                  • 21
                    Fully managed service
                  • 19
                    Cheap Pricing
                  • 16
                    Process hundreds of GB in seconds
                  • 11
                    Full table scans in seconds, no indexes needed
                  • 11
                    Big Data
                  • 8
                    Always on, no per-hour costs
                  • 5
                    Good combination with fluentd
                  • 4
                    Machine learning
                  CONS OF GOOGLE BIGQUERY
                  • 1
                    You can't unit test changes in BQ data

                  related Google BigQuery posts

                  Context: I wanted to create an end to end IoT data pipeline simulation in Google Cloud IoT Core and other GCP services. I never touched Terraform meaningfully until working on this project, and it's one of the best explorations in my development career. The documentation and syntax is incredibly human-readable and friendly. I'm used to building infrastructure through the google apis via Python , but I'm so glad past Sung did not make that decision. I was tempted to use Google Cloud Deployment Manager, but the templates were a bit convoluted by first impression. I'm glad past Sung did not make this decision either.

                  Solution: Leveraging Google Cloud Build Google Cloud Run Google Cloud Bigtable Google BigQuery Google Cloud Storage Google Compute Engine along with some other fun tools, I can deploy over 40 GCP resources using Terraform!

                  Check Out My Architecture: CLICK ME

                  Check out the GitHub repo attached

                  See more
                  Tim Specht
                  ‎Co-Founder and CTO at Dubsmash · | 14 upvotes · 652.4K views

                  In order to accurately measure & track user behaviour on our platform we moved over quickly from the initial solution using Google Analytics to a custom-built one due to resource & pricing concerns we had.

                  While this does sound complicated, it’s as easy as clients sending JSON blobs of events to Amazon Kinesis from where we use AWS Lambda & Amazon SQS to batch and process incoming events and then ingest them into Google BigQuery. Once events are stored in BigQuery (which usually only takes a second from the time the client sends the data until it’s available), we can use almost-standard-SQL to simply query for data while Google makes sure that, even with terabytes of data being scanned, query times stay in the range of seconds rather than hours. Before ingesting their data into the pipeline, our mobile clients are aggregating events internally and, once a certain threshold is reached or the app is going to the background, sending the events as a JSON blob into the stream.

                  In the past we had workers running that continuously read from the stream and would validate and post-process the data and then enqueue them for other workers to write them to BigQuery. We went ahead and implemented the Lambda-based approach in such a way that Lambda functions would automatically be triggered for incoming records, pre-aggregate events, and write them back to SQS, from which we then read them, and persist the events to BigQuery. While this approach had a couple of bumps on the road, like re-triggering functions asynchronously to keep up with the stream and proper batch sizes, we finally managed to get it running in a reliable way and are very happy with this solution today.

                  #ServerlessTaskProcessing #GeneralAnalytics #RealTimeDataProcessing #BigDataAsAService

                  See more