What is Spark Framework and what are its top alternatives?
Top Alternatives to Spark Framework
- Apache Spark
Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning. ...
- Hadoop
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. ...
- Spring
A key element of Spring is infrastructural support at the application level: Spring focuses on the "plumbing" of enterprise applications so that teams can focus on application-level business logic, without unnecessary ties to specific deployment environments. ...
- Spring Boot
Spring Boot makes it easy to create stand-alone, production-grade Spring based Applications that you can "just run". We take an opinionated view of the Spring platform and third-party libraries so you can get started with minimum fuss. Most Spring Boot applications need very little Spring configuration. ...
- ExpressJS
Express is a minimal and flexible node.js web application framework, providing a robust set of features for building single and multi-page, and hybrid web applications. ...
- Flask
Flask is intended for getting started very quickly and was developed with best intentions in mind. ...
- Django REST framework
It is a powerful and flexible toolkit that makes it easy to build Web APIs.
- Sinatra
Sinatra is a DSL for quickly creating web applications in Ruby with minimal effort. ...
Spark Framework alternatives & related posts
- Open-source61
- Fast and Flexible48
- One platform for every big data problem8
- Great for distributed SQL like applications8
- Easy to install and to use6
- Works well for most Datascience usecases3
- Interactive Query2
- Machine learning libratimery, Streaming in real2
- In memory Computation2
- Speed4
related Apache Spark posts
The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.
Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).
At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.
For more info:
- Our Algorithms Tour: https://algorithms-tour.stitchfix.com/
- Our blog: https://multithreaded.stitchfix.com/blog/
- Careers: https://multithreaded.stitchfix.com/careers/
#DataScience #DataStack #Data
Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :
Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:
https://eng.uber.com/marmaray-hadoop-ingestion-open-source/
(Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )
- Great ecosystem39
- One stack to rule them all11
- Great load balancer4
- Amazon aws1
- Java syntax1
related Hadoop posts
The early data ingestion pipeline at Pinterest used Kafka as the central message transporter, with the app servers writing messages directly to Kafka, which then uploaded log files to S3.
For databases, a custom Hadoop streamer pulled database data and wrote it to S3.
Challenges cited for this infrastructure included high operational overhead, as well as potential data loss occurring when Kafka broker outages led to an overflow of in-memory message buffering.
Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :
Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:
https://eng.uber.com/marmaray-hadoop-ingestion-open-source/
(Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )
Spring
- Java230
- Open source157
- Great community136
- Very powerful123
- Enterprise114
- Lot of great subprojects64
- Easy setup60
- Convention , configuration, done44
- Standard40
- Love the logic31
- Good documentation13
- Dependency injection11
- Stability11
- MVC9
- Easy6
- Makes the hard stuff fun & the easy stuff automatic3
- Strong typing3
- Code maintenance2
- Best practices2
- Maven2
- Great Desgin2
- Easy Integration with Spring Security2
- Integrations with most other Java frameworks2
- Java has more support and more libraries1
- Supports vast databases1
- Large ecosystem with seamless integration1
- OracleDb integration1
- Live project1
- Draws you into its own ecosystem and bloat15
- Verbose configuration3
- Poor documentation3
- Java3
- Java is more verbose language in compare to python2
related Spring posts
Is learning Spring and Spring Boot for web apps back-end development is still relevant in 2021? Feel free to share your views with comparison to Django/Node.js/ ExpressJS or other frameworks.
Please share some good beginner resources to start learning about spring/spring boot framework to build the web apps.
I am consulting for a company that wants to move its current CubeCart e-commerce site to another PHP based platform like PrestaShop or Magento. I was interested in alternatives that utilize Node.js as the primary platform. I currently don't know PHP, but I have done full stack dev with Java, Spring, Thymeleaf, etc.. I am just unsure that learning a set of technologies not commonly used makes sense. For example, in PrestaShop, I would need to work with JavaScript better and learn PHP, Twig, and Bootstrap. It seems more cumbersome than a Node JS system, where the language syntax stays the same for the full stack. I am looking for thoughts and advice on the relevance of PHP skillset into the future AND whether the Node based e-commerce open source options can compete with Magento or Prestashop.
Spring Boot
- Powerful and handy149
- Easy setup134
- Java128
- Spring90
- Fast85
- Extensible46
- Lots of "off the shelf" functionalities37
- Cloud Solid32
- Caches well26
- Productive24
- Many receipes around for obscure features24
- Modular23
- Integrations with most other Java frameworks23
- Spring ecosystem is great22
- Auto-configuration21
- Fast Performance With Microservices21
- Community18
- Easy setup, Community Support, Solid for ERP apps17
- One-stop shop15
- Easy to parallelize14
- Cross-platform14
- Easy setup, good for build erp systems, well documented13
- Powerful 3rd party libraries and frameworks13
- Easy setup, Git Integration12
- It's so easier to start a project on spring5
- Kotlin4
- Microservice and Reactive Programming1
- The ability to integrate with the open source ecosystem1
- Heavy weight23
- Annotation ceremony18
- Java13
- Many config files needed11
- Reactive5
- Excellent tools for cloud hosting, since 5.x4
- Java 😒😒1
related Spring Boot posts
We are in the process of building a modern content platform to deliver our content through various channels. We decided to go with Microservices architecture as we wanted scale. Microservice architecture style is an approach to developing an application as a suite of small independently deployable services built around specific business capabilities. You can gain modularity, extensive parallelism and cost-effective scaling by deploying services across many distributed servers. Microservices modularity facilitates independent updates/deployments, and helps to avoid single point of failure, which can help prevent large-scale outages. We also decided to use Event Driven Architecture pattern which is a popular distributed asynchronous architecture pattern used to produce highly scalable applications. The event-driven architecture is made up of highly decoupled, single-purpose event processing components that asynchronously receive and process events.
To build our #Backend capabilities we decided to use the following: 1. #Microservices - Java with Spring Boot , Node.js with ExpressJS and Python with Flask 2. #Eventsourcingframework - Amazon Kinesis , Amazon Kinesis Firehose , Amazon SNS , Amazon SQS, AWS Lambda 3. #Data - Amazon RDS , Amazon DynamoDB , Amazon S3 , MongoDB Atlas
To build #Webapps we decided to use Angular 2 with RxJS
#Devops - GitHub , Travis CI , Terraform , Docker , Serverless
Is learning Spring and Spring Boot for web apps back-end development is still relevant in 2021? Feel free to share your views with comparison to Django/Node.js/ ExpressJS or other frameworks.
Please share some good beginner resources to start learning about spring/spring boot framework to build the web apps.
ExpressJS
- Simple380
- Node.js336
- Javascript244
- High performance193
- Robust routing152
- Middlewares73
- Open source71
- Great community59
- Hybrid web applications37
- Well documented16
- Sinatra inspired9
- Rapid development9
- Socket connection7
- Isomorphic js.. superfast and easy7
- Light weight5
- Resource available for learning4
- Npm4
- Event loop3
- Callbacks3
- Data stream2
- Not python27
- Overrated17
- No multithreading14
- Javascript9
- Not fast5
- Easily Insecure for Novices2
related ExpressJS posts
Our whole Node.js backend stack consists of the following tools:
- Lerna as a tool for multi package and multi repository management
- npm as package manager
- NestJS as Node.js framework
- TypeScript as programming language
- ExpressJS as web server
- Swagger UI for visualizing and interacting with the API’s resources
- Postman as a tool for API development
- TypeORM as object relational mapping layer
- JSON Web Token for access token management
The main reason we have chosen Node.js over PHP is related to the following artifacts:
- Made for the web and widely in use: Node.js is a software platform for developing server-side network services. Well-known projects that rely on Node.js include the blogging software Ghost, the project management tool Trello and the operating system WebOS. Node.js requires the JavaScript runtime environment V8, which was specially developed by Google for the popular Chrome browser. This guarantees a very resource-saving architecture, which qualifies Node.js especially for the operation of a web server. Ryan Dahl, the developer of Node.js, released the first stable version on May 27, 2009. He developed Node.js out of dissatisfaction with the possibilities that JavaScript offered at the time. The basic functionality of Node.js has been mapped with JavaScript since the first version, which can be expanded with a large number of different modules. The current package managers (npm or Yarn) for Node.js know more than 1,000,000 of these modules.
- Fast server-side solutions: Node.js adopts the JavaScript "event-loop" to create non-blocking I/O applications that conveniently serve simultaneous events. With the standard available asynchronous processing within JavaScript/TypeScript, highly scalable, server-side solutions can be realized. The efficient use of the CPU and the RAM is maximized and more simultaneous requests can be processed than with conventional multi-thread servers.
- A language along the entire stack: Widely used frameworks such as React or AngularJS or Vue.js, which we prefer, are written in JavaScript/TypeScript. If Node.js is now used on the server side, you can use all the advantages of a uniform script language throughout the entire application development. The same language in the back- and frontend simplifies the maintenance of the application and also the coordination within the development team.
- Flexibility: Node.js sets very few strict dependencies, rules and guidelines and thus grants a high degree of flexibility in application development. There are no strict conventions so that the appropriate architecture, design structures, modules and features can be freely selected for the development.
Repost
Overview: To put it simply, we plan to use the MERN stack to build our web application. MongoDB will be used as our primary database. We will use ExpressJS alongside Node.js to set up our API endpoints. Additionally, we plan to use React to build our SPA on the client side and use Redis on the server side as our primary caching solution. Initially, while working on the project, we plan to deploy our server and client both on Heroku . However, Heroku is very limited and we will need the benefits of an Infrastructure as a Service so we will use Amazon EC2 to later deploy our final version of the application.
Serverside: nodemon will allow us to automatically restart a running instance of our node app when files changes take place. We decided to use MongoDB because it is a non relational database which uses the Document Object Model. This allows a lot of flexibility as compared to a RDMS like SQL which requires a very structural model of data that does not change too much. Another strength of MongoDB is its ease in scalability. We will use Mongoose along side MongoDB to model our application data. Additionally, we will host our MongoDB cluster remotely on MongoDB Atlas. Bcrypt will be used to encrypt user passwords that will be stored in the DB. This is to avoid the risks of storing plain text passwords. Moreover, we will use Cloudinary to store images uploaded by the user. We will also use the Twilio SendGrid API to enable automated emails sent by our application. To protect private API endpoints, we will use JSON Web Token and Passport. Also, PayPal will be used as a payment gateway to accept payments from users.
Client Side: As mentioned earlier, we will use React to build our SPA. React uses a virtual DOM which is very efficient in rendering a page. Also React will allow us to reuse components. Furthermore, it is very popular and there is a large community that uses React so it can be helpful if we run into issues. We also plan to make a cross platform mobile application later and using React will allow us to reuse a lot of our code with React Native. Redux will be used to manage state. Redux works great with React and will help us manage a global state in the app and avoid the complications of each component having its own state. Additionally, we will use Bootstrap components and custom CSS to style our app.
Other: Git will be used for version control. During the later stages of our project, we will use Google Analytics to collect useful data regarding user interactions. Moreover, Slack will be our primary communication tool. Also, we will use Visual Studio Code as our primary code editor because it is very light weight and has a wide variety of extensions that will boost productivity. Postman will be used to interact with and debug our API endpoints.
Flask
- Flexibilty14
- For it flexibility10
- Flexibilty and easy to use9
- Flask8
- User friendly7
- Secured6
- Unopinionated5
- Orm3
- Secure2
- Beautiful code1
- Easy to get started1
- Easy to develop and maintain applications1
- Not JS1
- Easy to use1
- Documentation1
- Python1
- Minimal1
- Lightweight1
- Easy to setup and get it going1
- Perfect for small to large projects with superb docs.1
- Easy to integrate1
- Speed1
- Get started quickly1
- Customizable1
- Simple to use1
- Powerful1
- Rapid development1
- Open source0
- Well designed0
- Productive0
- Awesome0
- Expressive0
- Love it0
- Not JS10
- Context7
- Not fast5
- Don't has many module as in spring1
related Flask posts
One of our top priorities at Pinterest is fostering a safe and trustworthy experience for all Pinners. As Pinterest’s user base and ads business grow, the review volume has been increasing exponentially, and more content types require moderation support. To solve greater engineering and operational challenges at scale, we needed a highly-reliable and performant system to detect, report, evaluate, and act on abusive content and users and so we created Pinqueue.
Pinqueue-3.0 serves as a generic platform for content moderation and human labeling. Under the hood, Pinqueue3.0 is a Flask + React app powered by Pinterest’s very own Gestalt UI framework. On the backend, Pinqueue3.0 heavily relies on PinLater, a Pinterest-built reliable asynchronous job execution system, to handle the requests for enqueueing and action-taking. Using PinLater has significantly strengthened Pinqueue3.0’s overall infra with its capability of processing a massive load of events with configurable retry policies.
Hundreds of millions of people around the world use Pinterest to discover and do what they love, and our job is to protect them from abusive and harmful content. We’re committed to providing an inspirational yet safe experience to all Pinners. Solving trust & safety problems is a joint effort requiring expertise across multiple domains. Pinqueue3.0 not only plays a critical role in responsively taking down unsafe content, it also has become an enabler for future ML/automation initiatives by providing high-quality human labels. Going forward, we will continue to improve the review experience, measure review quality and collaborate with our machine learning teams to solve content moderation beyond manual reviews at an even larger scale.
Hey, so I developed a basic application with Python. But to use it, you need a python interpreter. I want to add a GUI to make it more appealing. What should I choose to develop a GUI? I have very basic skills in front end development (CSS, JavaScript). I am fluent in python. I'm looking for a tool that is easy to use and doesn't require too much code knowledge. I have recently tried out Flask, but it is kinda complicated. Should I stick with it, move to Django, or is there another nice framework to use?
- Easy to use66
- Browsable api65
- Great documentation53
- Customizable50
- Fast development42
- Easy to use, customizable, pluggable, serializer9
- Python8
- Django ORM7
- FastSerialize5
- Less code3
- Easy implementation2
- Bad documentation2
- Reimplements Django functionality2
- No support for URL Namespaces1
- Bad CSRF handling0
related Django REST framework posts
Hey everyone! I'm planning on building a personal project - this will be my first full-stack project and will be a web app.
The way it will work is that users will be able to post groups. This can be, groups for studying or groups for work, etc. They can also set the desired group size (e.g. limit the group to 3 members). Other users can then join said group - once the group is full, it will automatically close.
What tech stack would you all recommend for this? I have a lot of experience with Django so maybe that will be good for the backend but I'm not sure where to go from there. I've heard using the Django REST framework with a React frontend might be good. Always open to learning new technologies and thanks in advance!
I am planning on creating an application using the following tech-stack. Vue.js (TypeScript) for the front-end, Django (specifically Django REST framework) for the server-side work, and using PostgreSQL as the database. Is there any reason NOT to use this tech stack mentioned or are there better options? Without giving away too much info, my app will be logging information from the user, displaying this information, setting goals, displaying visual graphs, a friend system where you can add other people etc...
- Lightweight65
- Simple50
- Open source35
- Ruby20
- Great ecosystem of tools13
- Ease of use10
- If you know http you know sinatra8
- Large Community5
- Fast5
- Flexibilty and easy to use1