Need advice about which tool to choose?Ask the StackShare community!
I am looking for an open-source scheduler tool with cross-functional application dependencies. Some of the tasks I am looking to schedule are as follows:
- Trigger Matillion ETL loads
- Trigger Attunity Replication tasks that have downstream ETL loads
- Trigger Golden gate Replication Tasks
- Shell scripts, wrappers, file watchers
- Event-driven schedules
I have used Airflow in the past, and I know we need to create DAGs for each pipeline. I am not familiar with Jenkins, but I know it works with configuration without much underlying code. I want to evaluate both and appreciate any advise

Hi Teja, Jenkins is more a CI/CD tool for triggering build/test and other CD tasks, from what you're describing you may be able to get along with #Jenkins and add lot's of plugins and create pipelines. But, eventually you're going to need to know #Groovy language to orchestrate all those tasks which is going to be similar to what you do with #Airflow, So, IMHO , Airflow is more for production scheduled tasks and Jenkins is more for CI/CD non-production tasks.

For 1 - 3, Airflow is a better solution.
Cron and OSQuery may be better solutions for 4 and 5, depending on what you're actually trying to do.
In either case, Jenkins is more trouble than it's worth for these types of workloads.
We are currently using Azure Pipelines for continous integration. Our applications are developed witn .NET framework. But when we look at the online Jenkins is the most widely used tool for continous integration. Can you please give me the advice which one is best to use for my case Azure pipeline or jenkins.

If your source code is on GitHub, also take a look at Github actions. https://github.com/features/actions
I'm open to anything. just want something that break less and doesn't need me to pay for it, and can be hosted on Docker. our scripting language is powershell core. so it's better to support it. also we are building dotnet core in our pipeline, so if they have anything related that helps with the CI would be nice.

Google cloud build can help you. It is hosted on cloud and also provide reasonable free quota.
I'm planning to setup complete CD-CD setup for spark and python application which we are going to deploy in aws lambda and EMR Cluster. Which tool would be best one to choose. Since my company is trying to adopt to concourse i would like to understand what are the lack of capabilities concourse have . Thanks in advance !

I would definetly recommend Concourse to you, as it is one of the most advanced modern methods of making CI/CD while Jenkins is an old monolithic dinosaur. Concourse itself is cloudnative and containerbased which helps you to build simple, high-performance and scalable CI/CD pipelines. In my opinion, the only lack of skills you have with Concourse is your own knowledge of how to build pipelines and automate things. Technincally there is no lack, i would even say you can extend it way more easily. But as a Con it is more easy to interact with Jenkins if you are only used to UIs. Concourse needs someone which is capable of using CLIs.
I am so confused. I need a tool that will allow me to go to about 10 different URLs to get a list of objects. Those object lists will be hundreds or thousands in length. I then need to get detailed data lists about each object. Those detailed data lists can have hundreds of elements that could be map/reduced somehow. My batch process dies sometimes halfway through which means hours of processing gone, i.e. time wasted. I need something like a directed graph that will keep results of successful data collection and allow me either pragmatically or manually to retry the failed ones some way (0 - forever) times. I want it to then process all the ones that have succeeded or been effectively ignored and load the data store with the aggregation of some couple thousand data-points. I know hitting this many endpoints is not a good practice but I can't put collectors on all the endpoints or anything like that. It is pretty much the only way to get the data.

For a non-streaming approach:
You could consider using more checkpoints throughout your spark jobs. Furthermore, you could consider separating your workload into multiple jobs with an intermittent data store (suggesting cassandra or you may choose based on your choice and availability) to store results , perform aggregations and store results of those.
Spark Job 1 - Fetch Data From 10 URLs and store data and metadata in a data store (cassandra) Spark Job 2..n - Check data store for unprocessed items and continue the aggregation
Alternatively for a streaming approach: Treating your data as stream might be useful also. Spark Streaming allows you to utilize a checkpoint interval - https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing
From a StackShare Community member: "Currently we use Travis CI and have optimized it as much as we can so our builds are fairly quick. Our boss is all about redundancy so we are looking for another solution to fall back on in case Travis goes down and/or jacks prices way up (they were recently acquired). Could someone recommend which CI we should go with and if they have time, an explanation of how they're different?"
We use CircleCI because of the better value it provides in its plans. I'm sure we could have used Travis just as easily but we found CircleCI's pricing to be more reasonable. In the two years since we signed up, the service has improved. CircleCI is always innovating and iterating on their platform. We have been very satisfied.
As the maintainer of the Karate DSL open-source project - I found Travis CI very easy to integrate into the GitHub workflow and it has been steady sailing for more than 2 years now ! It works well for Java / Apache Maven projects and we were able to configure it to use the latest Oracle JDK as per our needs. Thanks to the Travis CI team for this service to the open-source community !

I use Google Cloud Build because it's my first foray into the CICD world(loving it so far), and I wanted to work with something GCP native to avoid giving permissions to other SaaS tools like CircleCI and Travis CI.
I really like it because it's free for the first 120 minutes, and it's one of the few CICD tools that enterprises are open to using since it's contained within GCP.
One of the unique things is that it has the Kaniko cache, which speeds up builds by creating intermediate layers within the docker image vs. pushing the full thing from the start. Helpful when you're installing just a few additional dependencies.
Feel free to checkout an example: Cloudbuild Example

I use Travis CI because of various reasons - 1. Cloud based system so no dedicated server required, and you do not need to administrate it. 2. Easy YAML configuration. 3. Supports Major Programming Languages. 4. Support of build matrix 6. Supports AWS, Azure, Docker, Heroku, Google Cloud, Github Pages, PyPi and lot more. 7. Slack Notifications.

You are probably looking at another hosted solution: Jenkins is a good tool but it way too work intensive to be used as just a backup solution.
I have good experience with Circle-CI, Codeship, Drone.io and Travis (as well as problematic experiences with all of them), but my go-to tool is Gitlab CI: simple, powerful and if you have problems with their limitations or pricing, you can always install runners somewhere and use Gitlab just for scheduling and management. Even if you don't host your git repository at Gitlab, you can have Gitlab pull changes automatically from wherever you repo lives.

If you are considering Jenkins I would recommend at least checking out Buildkite. The agents are self-hosted (like Jenkins) but the interface is hosted for you. It meshes up some of the things I like about hosted services (pipeline definitions in YAML, managed interface and authentication) with things I like about Jenkins (local customizable agent images, secrets only on own instances, custom agent level scripts, sizing instances to your needs).
Jenkins is a pretty flexible, complete tool. Especially I love the possibility to configure jobs as a code with Jenkins pipelines.
CircleCI is well suited for small projects where the main task is to run continuous integration as quickly as possible. Travis CI is recommended primarily for open-source projects that need to be tested in different environments.
And for something a bit larger I prefer to use Jenkins because it is possible to make serious system configuration thereby different plugins. In Jenkins, I can change almost anything. But if you want to start the CI chain as soon as possible, Jenkins may not be the right choice.
Pros of Airflow
- Features49
- Task Dependency Management14
- Cluster of workers12
- Beautiful UI12
- Extensibility10
- Open source5
- Python5
- Complex workflows4
- Good api3
- Custom operators3
- Dashboard2
- Apache project2
Pros of Jenkins
- Hosted internally520
- Free open source464
- Great to build, deploy or launch anything async314
- Tons of integrations243
- Rich set of plugins with good documentation210
- Has support for build pipelines110
- Open source and tons of integrations72
- Easy setup65
- It is open-source62
- Workflow plugin54
- Configuration as code11
- Very powerful tool10
- Many Plugins9
- Continuous Integration9
- Great flexibility8
- Git and Maven integration is better8
- 100% free and open source7
- Github integration6
- Slack Integration (plugin)6
- Easy customisation5
- Self-hosted GitLab Integration (plugin)5
- Docker support4
- Pipeline API4
- Platform idnependency3
- Excellent docker integration3
- Fast builds3
- Hosted Externally3
- Customizable2
- AWS Integration2
- It's Everywhere2
- JOBDSL2
- Can be run as a Docker container2
- It`w worked2
- Easily extendable with seamless integration1
- Build PR Branch Only1
- NodeJS Support1
- PHP Support1
- Ruby/Rails Support1
- Universal controller1
- Loose Coupling1
Sign up to add or upvote prosMake informed product decisions
Cons of Airflow
- Running it on kubernetes cluster relatively complex2
- Open source - provides minimum or no support2
- Logical separation of DAGs is not straight forward1
- Observability is not great when the DAGs exceed 2501
Cons of Jenkins
- Workarounds needed for basic requirements12
- Groovy with cumbersome syntax9
- Plugins compatibility issues7
- Lack of support6
- Limited abilities with declarative pipelines6
- No YAML syntax4
- Too tied to plugins versions3