Hadoop vs Spark Framework

Overview

Hadoop

Stacks2.7K

Followers2.3K

Votes56

GitHub Stars15.3K

Forks9.1K

Spark Framework

Stacks39

Followers91

Votes7

GitHub Stars9.7K

Forks1.6K

Hadoop vs Spark Framework: What are the differences?

Hadoop: Open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage; Spark Framework: A micro framework for creating web applications in Kotlin and Java 8 with minimal effort. It is a simple and expressive Java/Kotlin web framework DSL built for rapid development. Its intention is to provide an alternative for Kotlin/Java developers that want to develop their web applications as expressive as possible and with minimal boilerplate.

Hadoop belongs to "Databases" category of the tech stack, while Spark Framework can be primarily classified under "Microframeworks (Backend)".

Hadoop is an open source tool with 9.4K GitHub stars and 5.85K GitHub forks. Here's a link to Hadoop's open source repository on GitHub.

Airbnb, Uber Technologies, and Netflix are some of the popular companies that use Hadoop, whereas Spark Framework is used by Kasa Smart, AfricanStockPhoto, and Khartec ltd. Hadoop has a broader approval, being mentioned in 309 company stacks & 623 developers stacks; compared to Spark Framework, which is listed in 5 company stacks and 4 developer stacks.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Hadoop, Spark Framework

Juan José

May 1, 2020

Decided

I developed Hexagon heavily inspired in these great tools because of the following reasons:

Take full advantage of the Kotlin programming language without any strings attached to Java (as a language).
I wanted to be able to replace the HTTP server library used with different adapters (Jetty, Netty, etc.) and though right now there is only one, more are coming.
Have a complete tool to do full applications, though you can use other libraries, Hexagon comes with a dependency injection helper, settings loading from different sources and HTTP Client, so it comes with (batteries included).

Right now I'm using it for my pet projects, and I'm happy with it.

35.9k views35.9k

Comments

Detailed Comparison

Hadoop	Spark Framework
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.	It is a simple and expressive Java/Kotlin web framework DSL built for rapid development. Its intention is to provide an alternative for Kotlin/Java developers that want to develop their web applications as expressive as possible and with minimal boilerplate.
-	Built for productivity; Lets you take full advantage of the JVM
Statistics
GitHub Stars 15.3K	GitHub Stars 9.7K
GitHub Forks 9.1K	GitHub Forks 1.6K
Stacks 2.7K	Stacks 39
Followers 2.3K	Followers 91
Votes 56	Votes 7
Pros & Cons
Pros 39 Great ecosystem 11 One stack to rule them all 4 Great load balancer 1 Java syntax 1 Amazon aws	Pros 2 Very easy to get up and running. Lovely API 1 Native paralelization 1 Java 1 Easy 1 Fast
Integrations
No integrations available	Kotlin Java

What are some alternatives to Hadoop, Spark Framework?

MongoDB

MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.

MySQL

The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software.

PostgreSQL

PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions.

ExpressJS

Express is a minimal and flexible node.js web application framework, providing a robust set of features for building single and multi-page, and hybrid web applications.

Microsoft SQL Server

Microsoft® SQL Server is a database management and analysis system for e-commerce, line-of-business, and data warehousing solutions.

SQLite

SQLite is an embedded SQL database engine. Unlike most other SQL databases, SQLite does not have a separate server process. SQLite reads and writes directly to ordinary disk files. A complete SQL database with multiple tables, indices, triggers, and views, is contained in a single disk file.

Cassandra

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

Memcached

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

MariaDB

Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stable, and sanely licensed open SQL server in the industry. MariaDB is designed as a drop-in replacement of MySQL(R) with more features, new storage engines, fewer bugs, and better performance.