Alternatives to Spring Batch logo

Alternatives to Spring Batch

Hadoop, Talend, Spring Boot, Apache Spark, and Kafka are the most popular alternatives and competitors to Spring Batch.
171
232
+ 1
0

What is Spring Batch and what are its top alternatives?

It is designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. It also provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management.
Spring Batch is a tool in the Frameworks (Full Stack) category of a tech stack.
Spring Batch is an open source tool with 2.4K GitHub stars and 2.2K GitHub forks. Here’s a link to Spring Batch's open source repository on GitHub

Top Alternatives to Spring Batch

  • Hadoop
    Hadoop

    The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. ...

  • Talend
    Talend

    It is an open source software integration platform helps you in effortlessly turning data into business insights. It uses native code generation that lets you run your data pipelines seamlessly across all cloud providers and get optimized performance on all platforms. ...

  • Spring Boot
    Spring Boot

    Spring Boot makes it easy to create stand-alone, production-grade Spring based Applications that you can "just run". We take an opinionated view of the Spring platform and third-party libraries so you can get started with minimum fuss. Most Spring Boot applications need very little Spring configuration. ...

  • Apache Spark
    Apache Spark

    Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning. ...

  • Kafka
    Kafka

    Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design. ...

  • AWS Batch
    AWS Batch

    It enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. It dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted. ...

  • Node.js
    Node.js

    Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices. ...

  • Django
    Django

    Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. ...

Spring Batch alternatives & related posts

Hadoop logo

Hadoop

2.4K
2.2K
56
Open-source software for reliable, scalable, distributed computing
2.4K
2.2K
+ 1
56
PROS OF HADOOP
  • 39
    Great ecosystem
  • 11
    One stack to rule them all
  • 4
    Great load balancer
  • 1
    Amazon aws
  • 1
    Java syntax
CONS OF HADOOP
    Be the first to leave a con

    related Hadoop posts

    Shared insights
    on
    KafkaKafkaHadoopHadoop
    at

    The early data ingestion pipeline at Pinterest used Kafka as the central message transporter, with the app servers writing messages directly to Kafka, which then uploaded log files to S3.

    For databases, a custom Hadoop streamer pulled database data and wrote it to S3.

    Challenges cited for this infrastructure included high operational overhead, as well as potential data loss occurring when Kafka broker outages led to an overflow of in-memory message buffering.

    See more
    Conor Myhrvold
    Tech Brand Mgr, Office of CTO at Uber · | 7 upvotes · 1.3M views

    Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :

    Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:

    https://eng.uber.com/marmaray-hadoop-ingestion-open-source/

    (Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )

    See more
    Talend logo

    Talend

    286
    234
    0
    A single, unified suite for all integration needs
    286
    234
    + 1
    0
    PROS OF TALEND
      Be the first to leave a pro
      CONS OF TALEND
        Be the first to leave a con

        related Talend posts

        Spring Boot logo

        Spring Boot

        23.2K
        21K
        1K
        Create Spring-powered, production-grade applications and services with absolute minimum fuss
        23.2K
        21K
        + 1
        1K
        PROS OF SPRING BOOT
        • 142
          Powerful and handy
        • 133
          Easy setup
        • 125
          Java
        • 90
          Spring
        • 85
          Fast
        • 46
          Extensible
        • 37
          Lots of "off the shelf" functionalities
        • 32
          Cloud Solid
        • 26
          Caches well
        • 24
          Many receipes around for obscure features
        • 24
          Productive
        • 23
          Modular
        • 23
          Integrations with most other Java frameworks
        • 22
          Spring ecosystem is great
        • 21
          Fast Performance With Microservices
        • 20
          Auto-configuration
        • 18
          Community
        • 17
          Easy setup, Community Support, Solid for ERP apps
        • 15
          One-stop shop
        • 14
          Easy to parallelize
        • 14
          Cross-platform
        • 13
          Easy setup, good for build erp systems, well documented
        • 13
          Powerful 3rd party libraries and frameworks
        • 12
          Easy setup, Git Integration
        • 5
          It's so easier to start a project on spring
        • 4
          Kotlin
        • 1
          The ability to integrate with the open source ecosystem
        • 1
          Microservice and Reactive Programming
        CONS OF SPRING BOOT
        • 23
          Heavy weight
        • 18
          Annotation ceremony
        • 13
          Java
        • 11
          Many config files needed
        • 5
          Reactive
        • 4
          Excellent tools for cloud hosting, since 5.x

        related Spring Boot posts

        Praveen Mooli
        Engineering Manager at Taylor and Francis · | 18 upvotes · 2.9M views

        We are in the process of building a modern content platform to deliver our content through various channels. We decided to go with Microservices architecture as we wanted scale. Microservice architecture style is an approach to developing an application as a suite of small independently deployable services built around specific business capabilities. You can gain modularity, extensive parallelism and cost-effective scaling by deploying services across many distributed servers. Microservices modularity facilitates independent updates/deployments, and helps to avoid single point of failure, which can help prevent large-scale outages. We also decided to use Event Driven Architecture pattern which is a popular distributed asynchronous architecture pattern used to produce highly scalable applications. The event-driven architecture is made up of highly decoupled, single-purpose event processing components that asynchronously receive and process events.

        To build our #Backend capabilities we decided to use the following: 1. #Microservices - Java with Spring Boot , Node.js with ExpressJS and Python with Flask 2. #Eventsourcingframework - Amazon Kinesis , Amazon Kinesis Firehose , Amazon SNS , Amazon SQS, AWS Lambda 3. #Data - Amazon RDS , Amazon DynamoDB , Amazon S3 , MongoDB Atlas

        To build #Webapps we decided to use Angular 2 with RxJS

        #Devops - GitHub , Travis CI , Terraform , Docker , Serverless

        See more

        Is learning Spring and Spring Boot for web apps back-end development is still relevant in 2021? Feel free to share your views with comparison to Django/Node.js/ ExpressJS or other frameworks.

        Please share some good beginner resources to start learning about spring/spring boot framework to build the web apps.

        See more
        Apache Spark logo

        Apache Spark

        2.9K
        3.4K
        139
        Fast and general engine for large-scale data processing
        2.9K
        3.4K
        + 1
        139
        PROS OF APACHE SPARK
        • 60
          Open-source
        • 48
          Fast and Flexible
        • 8
          Great for distributed SQL like applications
        • 8
          One platform for every big data problem
        • 6
          Easy to install and to use
        • 3
          Works well for most Datascience usecases
        • 2
          In memory Computation
        • 2
          Interactive Query
        • 2
          Machine learning libratimery, Streaming in real
        CONS OF APACHE SPARK
        • 3
          Speed

        related Apache Spark posts

        Eric Colson
        Chief Algorithms Officer at Stitch Fix · | 21 upvotes · 2.8M views

        The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.

        Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).

        At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.

        For more info:

        #DataScience #DataStack #Data

        See more
        Conor Myhrvold
        Tech Brand Mgr, Office of CTO at Uber · | 7 upvotes · 1.3M views

        Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :

        Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:

        https://eng.uber.com/marmaray-hadoop-ingestion-open-source/

        (Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )

        See more
        Kafka logo

        Kafka

        21.3K
        20.1K
        604
        Distributed, fault tolerant, high throughput pub-sub messaging system
        21.3K
        20.1K
        + 1
        604
        PROS OF KAFKA
        • 126
          High-throughput
        • 119
          Distributed
        • 92
          Scalable
        • 86
          High-Performance
        • 66
          Durable
        • 38
          Publish-Subscribe
        • 19
          Simple-to-use
        • 18
          Open source
        • 11
          Written in Scala and java. Runs on JVM
        • 8
          Message broker + Streaming system
        • 4
          Robust
        • 4
          Avro schema integration
        • 4
          KSQL
        • 3
          Suport Multiple clients
        • 2
          Partioned, replayable log
        • 1
          Simple publisher / multi-subscriber model
        • 1
          Flexible
        • 1
          Extremely good parallelism constructs
        • 1
          Fun
        CONS OF KAFKA
        • 32
          Non-Java clients are second-class citizens
        • 29
          Needs Zookeeper
        • 9
          Operational difficulties
        • 4
          Terrible Packaging

        related Kafka posts

        Eric Colson
        Chief Algorithms Officer at Stitch Fix · | 21 upvotes · 2.8M views

        The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.

        Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).

        At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.

        For more info:

        #DataScience #DataStack #Data

        See more
        John Kodumal

        As we've evolved or added additional infrastructure to our stack, we've biased towards managed services. Most new backing stores are Amazon RDS instances now. We do use self-managed PostgreSQL with TimescaleDB for time-series data—this is made HA with the use of Patroni and Consul.

        We also use managed Amazon ElastiCache instances instead of spinning up Amazon EC2 instances to run Redis workloads, as well as shifting to Amazon Kinesis instead of Kafka.

        See more
        AWS Batch logo

        AWS Batch

        87
        241
        6
        Fully Managed Batch Processing at Any Scale
        87
        241
        + 1
        6
        PROS OF AWS BATCH
        • 3
          Containerized
        • 3
          Scalable
        CONS OF AWS BATCH
        • 2
          More overhead than lambda
        • 1
          Image management

        related AWS Batch posts

        Node.js logo

        Node.js

        170.6K
        144K
        8.5K
        A platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications
        170.6K
        144K
        + 1
        8.5K
        PROS OF NODE.JS
        • 1.4K
          Npm
        • 1.3K
          Javascript
        • 1.1K
          Great libraries
        • 1K
          High-performance
        • 802
          Open source
        • 485
          Great for apis
        • 475
          Asynchronous
        • 421
          Great community
        • 390
          Great for realtime apps
        • 296
          Great for command line utilities
        • 82
          Websockets
        • 82
          Node Modules
        • 69
          Uber Simple
        • 59
          Great modularity
        • 58
          Allows us to reuse code in the frontend
        • 42
          Easy to start
        • 35
          Great for Data Streaming
        • 32
          Realtime
        • 28
          Awesome
        • 25
          Non blocking IO
        • 18
          Can be used as a proxy
        • 17
          High performance, open source, scalable
        • 16
          Non-blocking and modular
        • 15
          Easy and Fun
        • 14
          Easy and powerful
        • 13
          Future of BackEnd
        • 13
          Same lang as AngularJS
        • 12
          Fullstack
        • 11
          Fast
        • 10
          Scalability
        • 10
          Cross platform
        • 9
          Simple
        • 8
          Mean Stack
        • 7
          Great for webapps
        • 7
          Easy concurrency
        • 6
          React
        • 6
          Fast, simple code and async
        • 6
          Friendly
        • 6
          Typescript
        • 5
          Fast development
        • 5
          Its amazingly fast and scalable
        • 5
          Easy to use and fast and goes well with JSONdb's
        • 5
          Scalable
        • 5
          Great speed
        • 5
          Control everything
        • 4
          Easy to use
        • 4
          It's fast
        • 4
          Isomorphic coolness
        • 3
          Easy
        • 3
          Easy to learn
        • 3
          Great community
        • 3
          Not Python
        • 3
          Sooper easy for the Backend connectivity
        • 3
          TypeScript Support
        • 3
          Scales, fast, simple, great community, npm, express
        • 3
          One language, end-to-end
        • 3
          Less boilerplate code
        • 3
          Performant and fast prototyping
        • 3
          Blazing fast
        • 2
          Npm i ape-updating
        • 2
          Event Driven
        • 2
          Lovely
        • 1
          Creat for apis
        • 0
          Node
        CONS OF NODE.JS
        • 46
          Bound to a single CPU
        • 44
          New framework every day
        • 38
          Lots of terrible examples on the internet
        • 31
          Asynchronous programming is the worst
        • 23
          Callback
        • 18
          Javascript
        • 11
          Dependency based on GitHub
        • 11
          Dependency hell
        • 10
          Low computational power
        • 7
          Very very Slow
        • 7
          Can block whole server easily
        • 6
          Callback functions may not fire on expected sequence
        • 3
          Unneeded over complication
        • 3
          Unstable
        • 3
          Breaking updates
        • 2
          No standard approach
        • 1
          Bad transitive dependency management
        • 1
          Can't read server session

        related Node.js posts

        Nick Rockwell
        SVP, Engineering at Fastly · | 44 upvotes · 2.4M views

        When I joined NYT there was already broad dissatisfaction with the LAMP (Linux Apache HTTP Server MySQL PHP) Stack and the front end framework, in particular. So, I wasn't passing judgment on it. I mean, LAMP's fine, you can do good work in LAMP. It's a little dated at this point, but it's not ... I didn't want to rip it out for its own sake, but everyone else was like, "We don't like this, it's really inflexible." And I remember from being outside the company when that was called MIT FIVE when it had launched. And been observing it from the outside, and I was like, you guys took so long to do that and you did it so carefully, and yet you're not happy with your decisions. Why is that? That was more the impetus. If we're going to do this again, how are we going to do it in a way that we're gonna get a better result?

        So we're moving quickly away from LAMP, I would say. So, right now, the new front end is React based and using Apollo. And we've been in a long, protracted, gradual rollout of the core experiences.

        React is now talking to GraphQL as a primary API. There's a Node.js back end, to the front end, which is mainly for server-side rendering, as well.

        Behind there, the main repository for the GraphQL server is a big table repository, that we call Bodega because it's a convenience store. And that reads off of a Kafka pipeline.

        See more
        Conor Myhrvold
        Tech Brand Mgr, Office of CTO at Uber · | 42 upvotes · 6.2M views

        How Uber developed the open source, end-to-end distributed tracing Jaeger , now a CNCF project:

        Distributed tracing is quickly becoming a must-have component in the tools that organizations use to monitor their complex, microservice-based architectures. At Uber, our open source distributed tracing system Jaeger saw large-scale internal adoption throughout 2016, integrated into hundreds of microservices and now recording thousands of traces every second.

        Here is the story of how we got here, from investigating off-the-shelf solutions like Zipkin, to why we switched from pull to push architecture, and how distributed tracing will continue to evolve:

        https://eng.uber.com/distributed-tracing/

        (GitHub Pages : https://www.jaegertracing.io/, GitHub: https://github.com/jaegertracing/jaeger)

        Bindings/Operator: Python Java Node.js Go C++ Kubernetes JavaScript OpenShift C# Apache Spark

        See more
        Django logo

        Django

        34.5K
        31.2K
        4K
        The Web framework for perfectionists with deadlines
        34.5K
        31.2K
        + 1
        4K
        PROS OF DJANGO
        • 660
          Rapid development
        • 480
          Open source
        • 416
          Great community
        • 371
          Easy to learn
        • 271
          Mvc
        • 225
          Beautiful code
        • 217
          Elegant
        • 201
          Free
        • 198
          Great packages
        • 186
          Great libraries
        • 74
          Restful
        • 73
          Comes with auth and crud admin panel
        • 72
          Powerful
        • 69
          Great documentation
        • 65
          Great for web
        • 52
          Python
        • 39
          Great orm
        • 37
          Great for api
        • 28
          All included
        • 25
          Fast
        • 23
          Web Apps
        • 21
          Clean
        • 20
          Used by top startups
        • 19
          Easy setup
        • 18
          Sexy
        • 14
          ORM
        • 14
          Convention over configuration
        • 13
          Allows for very rapid development with great libraries
        • 12
          The Django community
        • 10
          Great MVC and templating engine
        • 10
          King of backend world
        • 8
          Full stack
        • 7
          Its elegant and practical
        • 7
          Batteries included
        • 6
          Cross-Platform
        • 6
          Very quick to get something up and running
        • 6
          Have not found anything that it can't do
        • 6
          Fast prototyping
        • 6
          Mvt
        • 5
          Easy Structure , useful inbuilt library
        • 5
          Zero code burden to change databases
        • 5
          Easy to develop end to end AI Models
        • 4
          Map
        • 4
          Python community
        • 4
          Easy to use
        • 4
          Easy to change database manager
        • 4
          Modular
        • 4
          Great peformance
        • 4
          Easy
        • 4
          Many libraries
        • 3
          Full-Text Search
        • 3
          Just the right level of abstraction
        • 3
          Scaffold
        • 1
          Scalable
        • 1
          Node js
        • 0
          Rails
        • 0
          Fastapi
        CONS OF DJANGO
        • 26
          Underpowered templating
        • 22
          Autoreload restarts whole server
        • 22
          Underpowered ORM
        • 15
          URL dispatcher ignores HTTP method
        • 10
          Internal subcomponents coupling
        • 8
          Not nodejs
        • 8
          Configuration hell
        • 7
          Admin
        • 5
          Not as clean and nice documentation like Laravel
        • 3
          Python
        • 3
          Not typed
        • 3
          Bloated admin panel included
        • 2
          Overwhelming folder structure
        • 2
          InEffective Multithreading
        • 1
          Not type safe

        related Django posts

        Dmitry Mukhin
        Engineer at Uploadcare · | 25 upvotes · 1.5M views

        Simple controls over complex technologies, as we put it, wouldn't be possible without neat UIs for our user areas including start page, dashboard, settings, and docs.

        Initially, there was Django. Back in 2011, considering our Python-centric approach, that was the best choice. Later, we realized we needed to iterate on our website more quickly. And this led us to detaching Django from our front end. That was when we decided to build an SPA.

        For building user interfaces, we're currently using React as it provided the fastest rendering back when we were building our toolkit. It’s worth mentioning Uploadcare is not a front-end-focused SPA: we aren’t running at high levels of complexity. If it were, we’d go with Ember.js.

        However, there's a chance we will shift to the faster Preact, with its motto of using as little code as possible, and because it makes more use of browser APIs. One of our future tasks for our front end is to configure our Webpack bundler to split up the code for different site sections. For styles, we use PostCSS along with its plugins such as cssnano which minifies all the code.

        All that allows us to provide a great user experience and quickly implement changes where they are needed with as little code as possible.

        See more

        Hey, so I developed a basic application with Python. But to use it, you need a python interpreter. I want to add a GUI to make it more appealing. What should I choose to develop a GUI? I have very basic skills in front end development (CSS, JavaScript). I am fluent in python. I'm looking for a tool that is easy to use and doesn't require too much code knowledge. I have recently tried out Flask, but it is kinda complicated. Should I stick with it, move to Django, or is there another nice framework to use?

        See more