What is GitHub Actions and what are its top alternatives?
Top Alternatives to GitHub Actions
- CircleCI
Continuous integration and delivery platform helps software teams rapidly release code with confidence by automating the build, test, and deploy process. Offers a modern software development platform that lets teams ramp. ...
- Jenkins
In a nutshell Jenkins CI is the leading open-source continuous integration server. Built with Java, it provides over 300 plugins to support building and testing virtually any project. ...
- Azure Pipelines
Fast builds with parallel jobs and test execution. Use container jobs to create consistent and reliable builds with the exact tools you need. Create new containers with ease and push them to any registry. ...
- GitLab CI
GitLab offers a continuous integration service. If you add a .gitlab-ci.yml file to the root directory of your repository, and configure your GitLab project to use a Runner, then each merge request or push triggers your CI pipeline. ...
- GitLab
GitLab offers git repository management, code reviews, issue tracking, activity feeds and wikis. Enterprises install GitLab on-premise and connect it with LDAP and Active Directory servers for secure authentication and authorization. A single GitLab server can handle more than 25,000 users but it is also possible to create a high availability setup with multiple active servers. ...
- Buildkite
CI and build automation tool that combines the power of your own build infrastructure with the convenience of a managed, centralized web UI. Used by Shopify, Basecamp, Digital Ocean, Venmo, Cochlear, Bugsnag and more. ...
- Travis CI
Free for open source projects, our CI environment provides multiple runtimes (e.g. Node.js or PHP versions), data stores and so on. Because of this, hosting your project on travis-ci.com means you can effortlessly test your library or applications against multiple runtimes and data stores without even having all of them installed locally. ...
- Airflow
Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed. ...
GitHub Actions alternatives & related posts
- Github integration226
- Easy setup177
- Fast builds153
- Competitively priced94
- Slack integration74
- Docker support55
- Awesome UI45
- Great customer support33
- Ios support18
- Hipchat integration14
- SSH debug access13
- Free for Open Source11
- Mobile support6
- Nodejs support5
- Bitbucket integration5
- YAML configuration5
- AWS CodeDeploy integration4
- Free for Github private repo3
- Great support3
- Clojurescript2
- Continuous Deployment2
- Parallelism2
- Clojure2
- OSX support2
- Simple, clean UI2
- Unstable1
- Ci1
- Favorite1
- Helpful documentation1
- Autoscaling1
- Extremely configurable1
- Works1
- Android support1
- Fair pricing1
- All inclusive testing1
- Japanese in rspec comment appears OK1
- Build PR Branch Only1
- So circular1
- Easy setup, easy to understand, fast and reliable1
- Parallel builds for slow test suites1
- Easy setup. 2.0 is fast!1
- Easy to deploy to private servers1
- Really easy to use1
- Stable0
- Unstable12
- Scammy pricing structure6
- Aggressive Github permissions0
related CircleCI posts
Often enough I have to explain my way of going about setting up a CI/CD pipeline with multiple deployment platforms. Since I am a bit tired of yapping the same every single time, I've decided to write it up and share with the world this way, and send people to read it instead ;). I will explain it on "live-example" of how the Rome got built, basing that current methodology exists only of readme.md and wishes of good luck (as it usually is ;)).
It always starts with an app, whatever it may be and reading the readmes available while Vagrant and VirtualBox is installing and updating. Following that is the first hurdle to go over - convert all the instruction/scripts into Ansible playbook(s), and only stopping when doing a clear vagrant up
or vagrant reload
we will have a fully working environment. As our Vagrant environment is now functional, it's time to break it! This is the moment to look for how things can be done better (too rigid/too lose versioning? Sloppy environment setup?) and replace them with the right way to do stuff, one that won't bite us in the backside. This is the point, and the best opportunity, to upcycle the existing way of doing dev environment to produce a proper, production-grade product.
I should probably digress here for a moment and explain why. I firmly believe that the way you deploy production is the same way you should deploy develop, shy of few debugging-friendly setting. This way you avoid the discrepancy between how production work vs how development works, which almost always causes major pains in the back of the neck, and with use of proper tools should mean no more work for the developers. That's why we start with Vagrant as developer boxes should be as easy as vagrant up
, but the meat of our product lies in Ansible which will do meat of the work and can be applied to almost anything: AWS, bare metal, docker, LXC, in open net, behind vpn - you name it.
We must also give proper consideration to monitoring and logging hoovering at this point. My generic answer here is to grab Elasticsearch, Kibana, and Logstash. While for different use cases there may be better solutions, this one is well battle-tested, performs reasonably and is very easy to scale both vertically (within some limits) and horizontally. Logstash rules are easy to write and are well supported in maintenance through Ansible, which as I've mentioned earlier, are at the very core of things, and creating triggers/reports and alerts based on Elastic and Kibana is generally a breeze, including some quite complex aggregations.
If we are happy with the state of the Ansible it's time to move on and put all those roles and playbooks to work. Namely, we need something to manage our CI/CD pipelines. For me, the choice is obvious: TeamCity. It's modern, robust and unlike most of the light-weight alternatives, it's transparent. What I mean by that is that it doesn't tell you how to do things, doesn't limit your ways to deploy, or test, or package for that matter. Instead, it provides a developer-friendly and rich playground for your pipelines. You can do most the same with Jenkins, but it has a quite dated look and feel to it, while also missing some key functionality that must be brought in via plugins (like quality REST API which comes built-in with TeamCity). It also comes with all the common-handy plugins like Slack or Apache Maven integration.
The exact flow between CI and CD varies too greatly from one application to another to describe, so I will outline a few rules that guide me in it: 1. Make build steps as small as possible. This way when something breaks, we know exactly where, without needing to dig and root around. 2. All security credentials besides development environment must be sources from individual Vault instances. Keys to those containers should exist only on the CI/CD box and accessible by a few people (the less the better). This is pretty self-explanatory, as anything besides dev may contain sensitive data and, at times, be public-facing. Because of that appropriate security must be present. TeamCity shines in this department with excellent secrets-management. 3. Every part of the build chain shall consume and produce artifacts. If it creates nothing, it likely shouldn't be its own build. This way if any issue shows up with any environment or version, all developer has to do it is grab appropriate artifacts to reproduce the issue locally. 4. Deployment builds should be directly tied to specific Git branches/tags. This enables much easier tracking of what caused an issue, including automated identifying and tagging the author (nothing like automated regression testing!).
Speaking of deployments, I generally try to keep it simple but also with a close eye on the wallet. Because of that, I am more than happy with AWS or another cloud provider, but also constantly peeking at the loads and do we get the value of what we are paying for. Often enough the pattern of use is not constantly erratic, but rather has a firm baseline which could be migrated away from the cloud and into bare metal boxes. That is another part where this approach strongly triumphs over the common Docker and CircleCI setup, where you are very much tied in to use cloud providers and getting out is expensive. Here to embrace bare-metal hosting all you need is a help of some container-based self-hosting software, my personal preference is with Proxmox and LXC. Following that all you must write are ansible scripts to manage hardware of Proxmox, similar way as you do for Amazon EC2 (ansible supports both greatly) and you are good to go. One does not exclude another, quite the opposite, as they can live in great synergy and cut your costs dramatically (the heavier your base load, the bigger the savings) while providing production-grade resiliency.
We actually started out on Travis CI, but we've migrated our main builds to CircleCI, and it's been a huge improvement.
The reason it's been a huge improvement is that Travis CI has a fundamentally bad design for their images, where they start with a standard base Linux image containing tons of packages (several versions of postgres, every programming language environment, etc). This is potentially nice for the "get builds for a small project running quickly" use case, but it's a total disaster for a larger project that needs a decent number of dependencies and cares about the performance and reliability of their build.
This issue is exacerbated by their networking infrastructure being unreliable; we usually saw over 1% of builds failing due to transient networking errors in Travis CI, even after we added retries to the most frequently failing operations like apt update
or pip install
. And they never install Ubuntu's point release updates to their images. So doing an apt update
, apt install
, or especially apt upgrade
would take forever. We ended up writing code to actually uninstall many of their base packages and pin the versions of hundreds of others to get a semi-fast, semi-reliable build. It was infuriating.
The CircleCI v2.0 system has the right design for a CI system: we can customize the base image to start with any expensive-to-install packages we need for our build, and we can update that image if and when we want to. The end result is that when migrating, we were able to delete all the hacky optimizations mentioned above, while still ending up with a 50% faster build latency. And we've also had 5-10x fewer issues with networking-related flakes, which means one doesn't have to constantly check whether a build failure is actually due to an issue with the code under test or "just another networking flake".
- Hosted internally523
- Free open source469
- Great to build, deploy or launch anything async318
- Tons of integrations243
- Rich set of plugins with good documentation211
- Has support for build pipelines111
- Easy setup68
- It is open-source66
- Workflow plugin53
- Configuration as code13
- Very powerful tool12
- Many Plugins11
- Great flexibility10
- Continuous Integration10
- Git and Maven integration is better9
- 100% free and open source8
- Slack Integration (plugin)7
- Github integration7
- Easy customisation6
- Self-hosted GitLab Integration (plugin)6
- Docker support5
- Pipeline API5
- Fast builds4
- Hosted Externally4
- Excellent docker integration4
- Platform idnependency4
- It's Everywhere3
- Can be run as a Docker container3
- AWS Integration3
- Customizable3
- JOBDSL3
- It`w worked3
- Loose Coupling2
- Build PR Branch Only2
- PHP Support2
- Easily extendable with seamless integration2
- Ruby/Rails Support2
- Universal controller2
- NodeJS Support2
- Workarounds needed for basic requirements13
- Groovy with cumbersome syntax10
- Plugins compatibility issues8
- Lack of support7
- Limited abilities with declarative pipelines7
- No YAML syntax5
- Too tied to plugins versions4
related Jenkins posts
Often enough I have to explain my way of going about setting up a CI/CD pipeline with multiple deployment platforms. Since I am a bit tired of yapping the same every single time, I've decided to write it up and share with the world this way, and send people to read it instead ;). I will explain it on "live-example" of how the Rome got built, basing that current methodology exists only of readme.md and wishes of good luck (as it usually is ;)).
It always starts with an app, whatever it may be and reading the readmes available while Vagrant and VirtualBox is installing and updating. Following that is the first hurdle to go over - convert all the instruction/scripts into Ansible playbook(s), and only stopping when doing a clear vagrant up
or vagrant reload
we will have a fully working environment. As our Vagrant environment is now functional, it's time to break it! This is the moment to look for how things can be done better (too rigid/too lose versioning? Sloppy environment setup?) and replace them with the right way to do stuff, one that won't bite us in the backside. This is the point, and the best opportunity, to upcycle the existing way of doing dev environment to produce a proper, production-grade product.
I should probably digress here for a moment and explain why. I firmly believe that the way you deploy production is the same way you should deploy develop, shy of few debugging-friendly setting. This way you avoid the discrepancy between how production work vs how development works, which almost always causes major pains in the back of the neck, and with use of proper tools should mean no more work for the developers. That's why we start with Vagrant as developer boxes should be as easy as vagrant up
, but the meat of our product lies in Ansible which will do meat of the work and can be applied to almost anything: AWS, bare metal, docker, LXC, in open net, behind vpn - you name it.
We must also give proper consideration to monitoring and logging hoovering at this point. My generic answer here is to grab Elasticsearch, Kibana, and Logstash. While for different use cases there may be better solutions, this one is well battle-tested, performs reasonably and is very easy to scale both vertically (within some limits) and horizontally. Logstash rules are easy to write and are well supported in maintenance through Ansible, which as I've mentioned earlier, are at the very core of things, and creating triggers/reports and alerts based on Elastic and Kibana is generally a breeze, including some quite complex aggregations.
If we are happy with the state of the Ansible it's time to move on and put all those roles and playbooks to work. Namely, we need something to manage our CI/CD pipelines. For me, the choice is obvious: TeamCity. It's modern, robust and unlike most of the light-weight alternatives, it's transparent. What I mean by that is that it doesn't tell you how to do things, doesn't limit your ways to deploy, or test, or package for that matter. Instead, it provides a developer-friendly and rich playground for your pipelines. You can do most the same with Jenkins, but it has a quite dated look and feel to it, while also missing some key functionality that must be brought in via plugins (like quality REST API which comes built-in with TeamCity). It also comes with all the common-handy plugins like Slack or Apache Maven integration.
The exact flow between CI and CD varies too greatly from one application to another to describe, so I will outline a few rules that guide me in it: 1. Make build steps as small as possible. This way when something breaks, we know exactly where, without needing to dig and root around. 2. All security credentials besides development environment must be sources from individual Vault instances. Keys to those containers should exist only on the CI/CD box and accessible by a few people (the less the better). This is pretty self-explanatory, as anything besides dev may contain sensitive data and, at times, be public-facing. Because of that appropriate security must be present. TeamCity shines in this department with excellent secrets-management. 3. Every part of the build chain shall consume and produce artifacts. If it creates nothing, it likely shouldn't be its own build. This way if any issue shows up with any environment or version, all developer has to do it is grab appropriate artifacts to reproduce the issue locally. 4. Deployment builds should be directly tied to specific Git branches/tags. This enables much easier tracking of what caused an issue, including automated identifying and tagging the author (nothing like automated regression testing!).
Speaking of deployments, I generally try to keep it simple but also with a close eye on the wallet. Because of that, I am more than happy with AWS or another cloud provider, but also constantly peeking at the loads and do we get the value of what we are paying for. Often enough the pattern of use is not constantly erratic, but rather has a firm baseline which could be migrated away from the cloud and into bare metal boxes. That is another part where this approach strongly triumphs over the common Docker and CircleCI setup, where you are very much tied in to use cloud providers and getting out is expensive. Here to embrace bare-metal hosting all you need is a help of some container-based self-hosting software, my personal preference is with Proxmox and LXC. Following that all you must write are ansible scripts to manage hardware of Proxmox, similar way as you do for Amazon EC2 (ansible supports both greatly) and you are good to go. One does not exclude another, quite the opposite, as they can live in great synergy and cut your costs dramatically (the heavier your base load, the bigger the savings) while providing production-grade resiliency.
Releasing new versions of our services is done by Travis CI. Travis first runs our test suite. Once it passes, it publishes a new release binary to GitHub.
Common tasks such as installing dependencies for the Go project, or building a binary are automated using plain old Makefiles. (We know, crazy old school, right?) Our binaries are compressed using UPX.
Travis has come a long way over the past years. I used to prefer Jenkins in some cases since it was easier to debug broken builds. With the addition of the aptly named “debug build” button, Travis is now the clear winner. It’s easy to use and free for open source, with no need to maintain anything.
#ContinuousIntegration #CodeCollaborationVersionControl
- Easy to get started4
- Unlimited CI/CD minutes3
- Built by Microsoft3
- Yaml support2
- Docker support2
related Azure Pipelines posts
We recently added new APIs to Jira to associate information about Builds and Deployments to Jira issues.
The new APIs were developed using a spec-first API approach for speed and sanity. The details of this approach are described in this blog post, and we relied on using Swagger and associated tools like Swagger UI.
A new service was created for managing the data. It provides a REST API for external use, and an internal API based on GraphQL. The service is built using Kotlin for increased developer productivity and happiness, and the Spring-Boot framework. PostgreSQL was chosen for the persistence layer, as we have non-trivial requirements that cannot be easily implemented on top of a key-value store.
The front-end has been built using React and querying the back-end service using an internal GraphQL API. We have plans of providing a public GraphQL API in the future.
New Jira Integrations: Bitbucket CircleCI AWS CodePipeline Octopus Deploy jFrog Azure Pipelines
Kindly suggest the best tool for generating 10Mn+ concurrent user load. The tool must support MQTT traffic, REST API, support to interfaces such as Kafka, websockets, persistence HTTP connection, auth type support to assess the support /coverage.
The tool can be integrated into CI pipelines like Azure Pipelines, GitHub, and Jenkins.
- Robust CI with awesome Docker support22
- Simple configuration13
- All in one solution9
- Source Control and CI in one place7
- Integrated with VCS on commit5
- Free and open source5
- Easy to configure own build server i.e. GitLab-Runner5
- Hosted internally2
- Built-in Docker Registry1
- Built-in support of Review Apps1
- Pipeline could be started manually1
- Enable or disable pipeline by using env variables1
- Gitlab templates could be shared across logical group1
- Easy to setup the dedicated runner to particular job1
- Built-in support of Kubernetes1
- Works best with GitLab repositories2
related GitLab CI posts
We use GitLab CI because of the great native integration as a part of the GitLab framework and the linting-capabilities it offers. The visualization of complex pipelines and the embedding within the project overview made Gitlab CI even more convenient. We use it for all projects, all deployments and as a part of GitLab Pages.
While we initially used the Shell-executor, we quickly switched to the Docker-executor and use it exclusively now.
We formerly used Jenkins but preferred to handle everything within GitLab . Aside from the unification of our infrastructure another motivation was the "configuration-in-file"-approach, that Gitlab CI offered, while Jenkins support of this concept was very limited and users had to resort to using the webinterface. Since the file is included within the repository, it is also version controlled, which was a huge plus for us.
We are using GitLab CI and were very happy with it. The integration of all tools like CI/CD, tickets, etc makes it very easy to stay on top of things. But be aware, Gitlab currently does not have iOS build support. So if you want to exchange that for CircleCI / Codeship to have to invest some effort. We are using a managed Mac OS device and installed the Gitlab runner there, to have iOS builds.
- Self hosted507
- Free429
- Has community edition339
- Easy setup242
- Familiar interface240
- Includes many features, including ci137
- Nice UI113
- Good integration with gitlabci84
- Simple setup57
- Free private repository34
- Has an official mobile app34
- Continuous Integration31
- Open source, great ui (like github)22
- Slack Integration18
- Full CI flow14
- Free and unlimited private git repos11
- User, group, and project access management is simple10
- All in one (Git, CI, Agile..)9
- Built-in CI8
- Intuitive UI8
- Full DevOps suite with Git6
- Both public and private Repositories6
- Integrated Docker Registry5
- Build/pipeline definition alongside code5
- So easy to use5
- CI5
- It's powerful source code management tool5
- Unlimited free repos & collaborators4
- Security and Stable4
- On-premises4
- It's fully integrated4
- Excellent4
- Issue system4
- Mattermost Chat client4
- Dockerized4
- Great for team collaboration3
- Free private repos3
- Because is the best remote host for git repositories3
- Low maintenance cost due omnibus-deployment3
- Not Microsoft Owned3
- Built-in Docker Registry3
- Opensource3
- I like the its runners and executors feature3
- Multilingual interface2
- Powerful software planning and maintaining tools2
- Review Apps feature2
- Kubernetes integration with GitLab CI2
- One-click install through DigitalOcean2
- Powerful Continuous Integration System2
- Native CI2
- HipChat intergration2
- Many private repo2
- Kubernetes Integration2
- Published IP list for whitelisting (gl-infra#434)2
- Wounderful2
- Beautiful2
- Groups of groups2
- The dashboard with deployed environments2
- It includes everything I need, all packaged with docker2
- Supports Radius/Ldap & Browser Code Edits1
- Slow ui performance28
- Introduce breaking bugs every release8
- Insecure (no published IP list for whitelisting)6
- Built-in Docker Registry2
- Review Apps feature1
related GitLab posts
I have mixed feelings on GitHub as a product and our use of it for the Zulip open source project. On the one hand, I do feel that being on GitHub helps people discover Zulip, because we have enough stars (etc.) that we rank highly among projects on the platform. and there is a definite benefit for lowering barriers to contribution (which is important to us) that GitHub has such a dominant position in terms of what everyone has accounts with.
But even ignoring how one might feel about their new corporate owner (MicroSoft), in a lot of ways GitHub is a bad product for open source projects. Years after the "Dear GitHub" letter, there are still basic gaps in its issue tracker:
- You can't give someone permission to label/categorize issues without full write access to a project (including ability to merge things to master, post releases, etc.).
- You can't let anyone with a GitHub account self-assign issues to themselves.
- Many more similar issues.
It's embarrassing, because I've talked to GitHub product managers at various open source events about these things for 3 years, and they always agree the thing is important, but then nothing ever improves in the Issues product. Maybe the new management at MicroSoft will fix their product management situation, but if not, I imagine we'll eventually do the migration to GitLab.
We have a custom bot project, http://github.com/zulip/zulipbot, to deal with some of these issues where possible, and every other large project we talk to does the same thing, more or less.
We use GitLab CI because of the great native integration as a part of the GitLab framework and the linting-capabilities it offers. The visualization of complex pipelines and the embedding within the project overview made Gitlab CI even more convenient. We use it for all projects, all deployments and as a part of GitLab Pages.
While we initially used the Shell-executor, we quickly switched to the Docker-executor and use it exclusively now.
We formerly used Jenkins but preferred to handle everything within GitLab . Aside from the unification of our infrastructure another motivation was the "configuration-in-file"-approach, that Gitlab CI offered, while Jenkins support of this concept was very limited and users had to resort to using the webinterface. Since the file is included within the repository, it is also version controlled, which was a huge plus for us.
- Great customer support18
- Github integration17
- Easy to use16
- Easy setup16
- Simplicity12
- Simple deployments10
- Simple and powerful configuration9
- Bitbucket integration4
- Github enterprise integration3
- Amazing swag3
- Integrates with everything2
- Sourcecode is hosted by source code owner.1
- Configuration in cloud1
- Run your own test containers with their AWS stack file1
- Superior user experience1
- Great ui1
related Buildkite posts
Travis CI
- Github integration506
- Free for open source388
- Easy to get started271
- Nice interface191
- Automatic deployment162
- Tutorials for each programming language72
- Friendly folks40
- Support for multiple ruby versions29
- Osx support28
- Easy handling of secret keys24
- Fast builds6
- Support for students4
- The best tool for Open Source CI3
- Hosted3
- Build Matrices3
- Github Pull Request build2
- Straightforward Github/Coveralls integration2
- Easy of Usage2
- Integrates with everything2
- Caching resolved artifacts1
- Docker support1
- Great Documentation1
- Build matrix1
- No-brainer for CI1
- Debug build workflow1
- Ubuntu trusty is not supported1
- Free for students1
- Configuration saved with project repository1
- Multi-threaded run1
- Hipchat Integration1
- Perfect0
- Can't be hosted insternally8
- Feature lacking3
- Unstable3
- Incomplete documentation for all platforms2
related Travis CI posts
Releasing new versions of our services is done by Travis CI. Travis first runs our test suite. Once it passes, it publishes a new release binary to GitHub.
Common tasks such as installing dependencies for the Go project, or building a binary are automated using plain old Makefiles. (We know, crazy old school, right?) Our binaries are compressed using UPX.
Travis has come a long way over the past years. I used to prefer Jenkins in some cases since it was easier to debug broken builds. With the addition of the aptly named “debug build” button, Travis is now the clear winner. It’s easy to use and free for open source, with no need to maintain anything.
#ContinuousIntegration #CodeCollaborationVersionControl
We actually started out on Travis CI, but we've migrated our main builds to CircleCI, and it's been a huge improvement.
The reason it's been a huge improvement is that Travis CI has a fundamentally bad design for their images, where they start with a standard base Linux image containing tons of packages (several versions of postgres, every programming language environment, etc). This is potentially nice for the "get builds for a small project running quickly" use case, but it's a total disaster for a larger project that needs a decent number of dependencies and cares about the performance and reliability of their build.
This issue is exacerbated by their networking infrastructure being unreliable; we usually saw over 1% of builds failing due to transient networking errors in Travis CI, even after we added retries to the most frequently failing operations like apt update
or pip install
. And they never install Ubuntu's point release updates to their images. So doing an apt update
, apt install
, or especially apt upgrade
would take forever. We ended up writing code to actually uninstall many of their base packages and pin the versions of hundreds of others to get a semi-fast, semi-reliable build. It was infuriating.
The CircleCI v2.0 system has the right design for a CI system: we can customize the base image to start with any expensive-to-install packages we need for our build, and we can update that image if and when we want to. The end result is that when migrating, we were able to delete all the hacky optimizations mentioned above, while still ending up with a 50% faster build latency. And we've also had 5-10x fewer issues with networking-related flakes, which means one doesn't have to constantly check whether a build failure is actually due to an issue with the code under test or "just another networking flake".
Airflow
- Features50
- Task Dependency Management14
- Beautiful UI12
- Cluster of workers12
- Extensibility10
- Open source6
- Complex workflows5
- Python5
- Good api3
- Apache project3
- Custom operators3
- Dashboard2
- Observability is not great when the DAGs exceed 2502
- Running it on kubernetes cluster relatively complex2
- Open source - provides minimum or no support2
- Logical separation of DAGs is not straight forward1
related Airflow posts
I am working on a project that grabs a set of input data from AWS S3, pre-processes and divvies it up, spins up 10K batch containers to process the divvied data in parallel on AWS Batch, post-aggregates the data, and pushes it to S3.
I already have software patterns from other projects for Airflow + Batch but have not dealt with the scaling factors of 10k parallel tasks. Airflow is nice since I can look at which tasks failed and retry a task after debugging. But dealing with that many tasks on one Airflow EC2 instance seems like a barrier. Another option would be to have one task that kicks off the 10k containers and monitors it from there.
I have no experience with AWS Step Functions but have heard it's AWS's Airflow. There looks to be plenty of patterns online for Step Functions + Batch. Do Step Functions seem like a good path to check out for my use case? Do you get the same insights on failing jobs / ability to retry tasks as you do with Airflow?
I am looking for an open-source scheduler tool with cross-functional application dependencies. Some of the tasks I am looking to schedule are as follows:
- Trigger Matillion ETL loads
- Trigger Attunity Replication tasks that have downstream ETL loads
- Trigger Golden gate Replication Tasks
- Shell scripts, wrappers, file watchers
- Event-driven schedules
I have used Airflow in the past, and I know we need to create DAGs for each pipeline. I am not familiar with Jenkins, but I know it works with configuration without much underlying code. I want to evaluate both and appreciate any advise