What is Spark Framework and what are its top alternatives?
Top Alternatives to Spark Framework
Apache Spark
Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning. ...
Hadoop
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. ...
Spring
A key element of Spring is infrastructural support at the application level: Spring focuses on the "plumbing" of enterprise applications so that teams can focus on application-level business logic, without unnecessary ties to specific deployment environments. ...
Spring Boot
Spring Boot makes it easy to create stand-alone, production-grade Spring based Applications that you can "just run". We take an opinionated view of the Spring platform and third-party libraries so you can get started with minimum fuss. Most Spring Boot applications need very little Spring configuration. ...
ExpressJS
Express is a minimal and flexible node.js web application framework, providing a robust set of features for building single and multi-page, and hybrid web applications. ...
Flask
Flask is intended for getting started very quickly and was developed with best intentions in mind. ...
Django REST framework
It is a powerful and flexible toolkit that makes it easy to build Web APIs.
Sinatra
Sinatra is a DSL for quickly creating web applications in Ruby with minimal effort. ...
Spark Framework alternatives & related posts
- Open-source58
- Fast and Flexible47
- One platform for every big data problem7
- Easy to install and to use6
- Great for distributed SQL like applications6
- Works well for most Datascience usecases3
- Machine learning libratimery, Streaming in real2
- In memory Computation2
- Interactive Query0
- Speed3
related Apache Spark posts










The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.
Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).
At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.
For more info:
- Our Algorithms Tour: https://algorithms-tour.stitchfix.com/
- Our blog: https://multithreaded.stitchfix.com/blog/
- Careers: https://multithreaded.stitchfix.com/careers/
#DataScience #DataStack #Data
Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :
Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:
https://eng.uber.com/marmaray-hadoop-ingestion-open-source/
(Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )
- Great ecosystem38
- One stack to rule them all11
- Great load balancer4
- Java syntax1
related Hadoop posts
Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :
Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:
https://eng.uber.com/marmaray-hadoop-ingestion-open-source/
(Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )
The early data ingestion pipeline at Pinterest used Kafka as the central message transporter, with the app servers writing messages directly to Kafka, which then uploaded log files to S3.
For databases, a custom Hadoop streamer pulled database data and wrote it to S3.
Challenges cited for this infrastructure included high operational overhead, as well as potential data loss occurring when Kafka broker outages led to an overflow of in-memory message buffering.
Spring
- Java216
- Open source153
- Great community131
- Very powerful117
- Enterprise110
- Lot of great subprojects61
- Easy setup58
- Convention , configuration, done44
- Standard37
- Love the logic29
- Dependency injection10
- Good documentation10
- Stability9
- MVC6
- Easy6
- Makes the hard stuff fun & the easy stuff automatic3
- Strong typing3
- Great Desgin2
- Integrations with most other Java frameworks2
- Easy Integration with Spring Security2
- Maven2
- Best practices1
- Live project1
- OracleDb integration1
- Code maintenance1
- Large ecosystem with seamless integration1
- Java has more support and more libraries1
- Supports vast databases1
- Draws you into its own ecosystem and bloat12
- Verbose configuration2
- Poor documentation2
- Java1
related Spring posts
Is learning Spring and Spring Boot for web apps back-end development is still relevant in 2021? Feel free to share your views with comparison to Django/Node.js/ ExpressJS or other frameworks.
Please share some good beginner resources to start learning about spring/spring boot framework to build the web apps.
I am consulting for a company that wants to move its current CubeCart e-commerce site to another PHP based platform like PrestaShop or Magento. I was interested in alternatives that utilize Node.js as the primary platform. I currently don't know PHP, but I have done full stack dev with Java, Spring, Thymeleaf, etc.. I am just unsure that learning a set of technologies not commonly used makes sense. For example, in PrestaShop, I would need to work with JavaScript better and learn PHP, Twig, and Bootstrap. It seems more cumbersome than a Node JS system, where the language syntax stays the same for the full stack. I am looking for thoughts and advice on the relevance of PHP skillset into the future AND whether the Node based e-commerce open source options can compete with Magento or Prestashop.
Spring Boot
- Powerful and handy127
- Easy setup121
- Java111
- Spring83
- Fast79
- Extensible39
- Lots of "off the shelf" functionalities32
- Cloud Solid27
- Caches well21
- Many receipes around for obscure features19
- Modular18
- Productive18
- Integrations with most other Java frameworks17
- Spring ecosystem is great16
- Fast Performance With Microservices16
- Community14
- Auto-configuration13
- Easy setup, Community Support, Solid for ERP apps11
- One-stop shop11
- Easy to parallelize10
- Cross-platform9
- Easy setup, good for build erp systems, well documented9
- Easy setup, Git Integration8
- Powerful 3rd party libraries and frameworks8
- Kotlin2
- It's so easier to start a project on spring2
- Heavy weight18
- Annotation ceremony17
- Many config files needed10
- Java7
- Reactive5
- Excellent tools for cloud hosting, since 5.x4
related Spring Boot posts






















We are in the process of building a modern content platform to deliver our content through various channels. We decided to go with Microservices architecture as we wanted scale. Microservice architecture style is an approach to developing an application as a suite of small independently deployable services built around specific business capabilities. You can gain modularity, extensive parallelism and cost-effective scaling by deploying services across many distributed servers. Microservices modularity facilitates independent updates/deployments, and helps to avoid single point of failure, which can help prevent large-scale outages. We also decided to use Event Driven Architecture pattern which is a popular distributed asynchronous architecture pattern used to produce highly scalable applications. The event-driven architecture is made up of highly decoupled, single-purpose event processing components that asynchronously receive and process events.
To build our #Backend capabilities we decided to use the following: 1. #Microservices - Java with Spring Boot , Node.js with ExpressJS and Python with Flask 2. #Eventsourcingframework - Amazon Kinesis , Amazon Kinesis Firehose , Amazon SNS , Amazon SQS, AWS Lambda 3. #Data - Amazon RDS , Amazon DynamoDB , Amazon S3 , MongoDB Atlas
To build #Webapps we decided to use Angular 2 with RxJS
#Devops - GitHub , Travis CI , Terraform , Docker , Serverless
Is learning Spring and Spring Boot for web apps back-end development is still relevant in 2021? Feel free to share your views with comparison to Django/Node.js/ ExpressJS or other frameworks.
Please share some good beginner resources to start learning about spring/spring boot framework to build the web apps.
ExpressJS
- Simple362
- Node.js318
- Javascript234
- High performance183
- Robust routing147
- Open source66
- Middlewares63
- Great community51
- Hybrid web applications33
- Sinatra inspired8
- Well documented8
- Isomorphic js.. superfast and easy4
- Rapid development3
- Event loop2
- Socket connection2
- Npm2
- Resource available for learning2
- Light weight2
- Data stream1
- Callbacks1
- Xxx0
- Not python21
- Overrated15
- No multithreading14
- Javascript6
- Not fast4
- Easily Insecure for Novices2
related ExpressJS posts
















Our whole Node.js backend stack consists of the following tools:
- Lerna as a tool for multi package and multi repository management
- npm as package manager
- NestJS as Node.js framework
- TypeScript as programming language
- ExpressJS as web server
- Swagger UI for visualizing and interacting with the API’s resources
- Postman as a tool for API development
- TypeORM as object relational mapping layer
- JSON Web Token for access token management
The main reason we have chosen Node.js over PHP is related to the following artifacts:
- Made for the web and widely in use: Node.js is a software platform for developing server-side network services. Well-known projects that rely on Node.js include the blogging software Ghost, the project management tool Trello and the operating system WebOS. Node.js requires the JavaScript runtime environment V8, which was specially developed by Google for the popular Chrome browser. This guarantees a very resource-saving architecture, which qualifies Node.js especially for the operation of a web server. Ryan Dahl, the developer of Node.js, released the first stable version on May 27, 2009. He developed Node.js out of dissatisfaction with the possibilities that JavaScript offered at the time. The basic functionality of Node.js has been mapped with JavaScript since the first version, which can be expanded with a large number of different modules. The current package managers (npm or Yarn) for Node.js know more than 1,000,000 of these modules.
- Fast server-side solutions: Node.js adopts the JavaScript "event-loop" to create non-blocking I/O applications that conveniently serve simultaneous events. With the standard available asynchronous processing within JavaScript/TypeScript, highly scalable, server-side solutions can be realized. The efficient use of the CPU and the RAM is maximized and more simultaneous requests can be processed than with conventional multi-thread servers.
- A language along the entire stack: Widely used frameworks such as React or AngularJS or Vue.js, which we prefer, are written in JavaScript/TypeScript. If Node.js is now used on the server side, you can use all the advantages of a uniform script language throughout the entire application development. The same language in the back- and frontend simplifies the maintenance of the application and also the coordination within the development team.
- Flexibility: Node.js sets very few strict dependencies, rules and guidelines and thus grants a high degree of flexibility in application development. There are no strict conventions so that the appropriate architecture, design structures, modules and features can be freely selected for the development.
Hello, I hope everyone is doing good and safe. I need advice on what to learn more, I have started learning HTML, CSS, Bootstrap, JavaScript, Node.js, ExpressJS, React. eventually will learn MongoDB too. I would like to be a Front End developer or full-stack developer. What else would be the suggestion to get a job and what things I need to focus more on as a fresher to make my skills better. Do I have to be good in Algorithms and Dynamic Programming to find a job for entry-level? Looking forward to hearing from you guys for suggestions.Â
Flask
- Lightweight298
- Python257
- Minimal207
- Open source140
- Documentation95
- Easy to use62
- Easy to setup and get it going51
- Well designed51
- Easy to develop and maintain applications45
- Easy to get started43
- Beautiful code15
- Rapid development14
- Expressive12
- Powerful12
- Awesome11
- Love it10
- Speed10
- Simple to use9
- Flexibilty9
- Get started quickly8
- For it flexibility8
- Perfect for small to large projects with superb docs.8
- Flexibilty and easy to use7
- Easy to integrate7
- Productive7
- Customizable6
- Not JS6
- Secured5
- User friendly5
- Flask5
- Unopinionated3
- Not JS10
- Context7
- Not fast3
related Flask posts
One of our top priorities at Pinterest is fostering a safe and trustworthy experience for all Pinners. As Pinterest’s user base and ads business grow, the review volume has been increasing exponentially, and more content types require moderation support. To solve greater engineering and operational challenges at scale, we needed a highly-reliable and performant system to detect, report, evaluate, and act on abusive content and users and so we created Pinqueue.
Pinqueue-3.0 serves as a generic platform for content moderation and human labeling. Under the hood, Pinqueue3.0 is a Flask + React app powered by Pinterest’s very own Gestalt UI framework. On the backend, Pinqueue3.0 heavily relies on PinLater, a Pinterest-built reliable asynchronous job execution system, to handle the requests for enqueueing and action-taking. Using PinLater has significantly strengthened Pinqueue3.0’s overall infra with its capability of processing a massive load of events with configurable retry policies.
Hundreds of millions of people around the world use Pinterest to discover and do what they love, and our job is to protect them from abusive and harmful content. We’re committed to providing an inspirational yet safe experience to all Pinners. Solving trust & safety problems is a joint effort requiring expertise across multiple domains. Pinqueue3.0 not only plays a critical role in responsively taking down unsafe content, it also has become an enabler for future ML/automation initiatives by providing high-quality human labels. Going forward, we will continue to improve the review experience, measure review quality and collaborate with our machine learning teams to solve content moderation beyond manual reviews at an even larger scale.
Hey, so I developed a basic application with Python. But to use it, you need a python interpreter. I want to add a GUI to make it more appealing. What should I choose to develop a GUI? I have very basic skills in front end development (CSS, JavaScript). I am fluent in python. I'm looking for a tool that is easy to use and doesn't require too much code knowledge. I have recently tried out Flask, but it is kinda complicated. Should I stick with it, move to Django, or is there another nice framework to use?
- Browsable api63
- Easy to use62
- Great documentation52
- Customizable49
- Fast development41
- Easy to use, customizable, pluggable, serializer9
- Python8
- Django ORM5
- FastSerialize4
- Easy implementation2
- Less code2
- Dsasda0
- Bad documentation2
- Reimplements Django functionality2
- No support for URL Namespaces1
- Bad CSRF handling0
related Django REST framework posts
Zulip has been powered by Django since the very early days of its development with Django 1.4, back in 2012. As a reasonably mature web application with significant scale, we're at the stage in many companies' development where one starts to rip out more and more of the web framework to optimize things or just make them work the way we want. (E.g. while I was at Dropbox in early 2016, we discovered we only had about 600 lines of code left from the original Pylons framework that actually ran).
One of the things that has been really fantastic about Django is that we're still happily using it for the vast majority of code in the project, and every time Django comes out with a new release, I read the changelog and get excited about several improvements that actually make my life better. While Django has made some design decisions that I don't agree with (e.g. I'm not a fan of Django REST framework, and think it makes life more difficult), Django also makes it easy to do your own thing, which we've done to great effect (see the linked article for details on our has_request_variables
framework).
Overall I think we've gotten a ton of value out of Python and Django and would recommend it to anyone starting a new full-featured web application project today.
Hi
I’ve been using Django for the last year on and off to do my backend API. I’m getting a bit frustrated with the Django REST framework with the setup of the serializers and Django for the lack of web sockets. I’m considering either Spring or .NET Core. I’m familiar with Kotlin and C# but I’ve not built any substantial projects with them. I like OOP, building a desktop app, web API, and also the potential to get a job in the future or building a tool at work to manage my documents, dashboard and processes point cloud data.
I’m familiar with c/cpp, TypeScript.
I would love your insights on where I should go.
- Lightweight65
- Simple49
- Open source35
- Ruby20
- Great ecosystem of tools13
- Ease of use10
- If you know http you know sinatra8
- Fast5
- Large Community5
- Flexibilty and easy to use1