Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

Hibernate
Hibernate

633
435
+ 1
16
Apache Spark
Apache Spark

1K
821
+ 1
98
Add tool

Hibernate vs Apache Spark: What are the differences?

What is Hibernate? Idiomatic persistence for Java and relational databases. Hibernate is a suite of open source projects around domain models. The flagship project is Hibernate ORM, the Object Relational Mapper.

What is Apache Spark? Fast and general engine for large-scale data processing. Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Hibernate belongs to "Object Relational Mapper (ORM)" category of the tech stack, while Apache Spark can be primarily classified under "Big Data Tools".

"Easy ORM" is the top reason why over 9 developers like Hibernate, while over 45 developers mention "Open-source" as the leading cause for choosing Apache Spark.

Apache Spark is an open source tool with 22.3K GitHub stars and 19.3K GitHub forks. Here's a link to Apache Spark's open source repository on GitHub.

Slack, Shopify, and SendGrid are some of the popular companies that use Apache Spark, whereas Hibernate is used by Bodybuilding.com, StyleShare Inc., and Peewah. Apache Spark has a broader approval, being mentioned in 263 company stacks & 111 developers stacks; compared to Hibernate, which is listed in 85 company stacks and 72 developer stacks.

- No public GitHub repository available -

What is Hibernate?

Hibernate is a suite of open source projects around domain models. The flagship project is Hibernate ORM, the Object Relational Mapper.

What is Apache Spark?

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

Why do developers choose Hibernate?
Why do developers choose Apache Spark?

Sign up to add, upvote and see more prosMake informed product decisions

    Be the first to leave a con
    What companies use Hibernate?
    What companies use Apache Spark?

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with Hibernate?
    What tools integrate with Apache Spark?

    Sign up to get full access to all the tool integrationsMake informed product decisions

    What are some alternatives to Hibernate and Apache Spark?
    MyBatis
    It is a first class persistence framework with support for custom SQL, stored procedures and advanced mappings. It eliminates almost all of the JDBC code and manual setting of parameters and retrieval of results. It can use simple XML or Annotations for configuration and map primitives, Map interfaces and Java POJOs (Plain Old Java Objects) to database records.
    Spring
    A key element of Spring is infrastructural support at the application level: Spring focuses on the "plumbing" of enterprise applications so that teams can focus on application-level business logic, without unnecessary ties to specific deployment environments.
    Sequelize
    Sequelize is a promise-based ORM for Node.js and io.js. It supports the dialects PostgreSQL, MySQL, MariaDB, SQLite and MSSQL and features solid transaction support, relations, read replication and more.
    SQLAlchemy
    SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.
    Doctrine 2
    Doctrine 2 sits on top of a powerful database abstraction layer (DBAL). One of its key features is the option to write database queries in a proprietary object oriented SQL dialect called Doctrine Query Language (DQL), inspired by Hibernates HQL.
    See all alternatives
    Decisions about Hibernate and Apache Spark
    StackShare Editors
    StackShare Editors
    Presto
    Presto
    Apache Spark
    Apache Spark
    Hadoop
    Hadoop

    Around 2015, the growing use of Uber’s data exposed limitations in the ETL and Vertica-centric setup, not to mention the increasing costs. “As our company grew, scaling our data warehouse became increasingly expensive. To cut down on costs, we started deleting older, obsolete data to free up space for new data.”

    To overcome these challenges, Uber rebuilt their big data platform around Hadoop. “More specifically, we introduced a Hadoop data lake where all raw data was ingested from different online data stores only once and with no transformation during ingestion.”

    “In order for users to access data in Hadoop, we introduced Presto to enable interactive ad hoc user queries, Apache Spark to facilitate programmatic access to raw data (in both SQL and non-SQL formats), and Apache Hive to serve as the workhorse for extremely large queries.

    See more
    StackShare Editors
    StackShare Editors
    Presto
    Presto
    Apache Spark
    Apache Spark
    Hadoop
    Hadoop

    To improve platform scalability and efficiency, Uber transitioned from JSON to Parquet, and built a central schema service to manage schemas and integrate different client libraries.

    While the first generation big data platform was vulnerable to upstream data format changes, “ad hoc data ingestions jobs were replaced with a standard platform to transfer all source data in its original, nested format into the Hadoop data lake.”

    These platform changes enabled the scaling challenges Uber was facing around that time: “On a daily basis, there were tens of terabytes of new data added to our data lake, and our Big Data platform grew to over 10,000 vcores with over 100,000 running batch jobs on any given day.”

    See more
    StackShare Editors
    StackShare Editors
    Presto
    Presto
    Apache Spark
    Apache Spark
    Scala
    Scala
    MySQL
    MySQL
    Kafka
    Kafka

    Slack’s data team works to “provide an ecosystem to help people in the company quickly and easily answer questions about usage, so they can make better and data informed decisions.” To achieve that goal, that rely on a complex data pipeline.

    An in-house tool call Sqooper scrapes MySQL backups and pipe them to S3. Job queue and log data is sent to Kafka then persisted to S3 using an open source tool called Secor, which was created by Pinterest.

    For compute, Amazon’s Elastic MapReduce (EMR) creates clusters preconfigured for Presto, Hive, and Spark.

    Presto is then used for ad-hoc questions, validating data assumptions, exploring smaller datasets, and creating visualizations for some internal tools. Hive is used for larger data sets or longer time series data, and Spark allows teams to write efficient and robust batch and aggregation jobs. Most of the Spark pipeline is written in Scala.

    Thrift binds all of these engines together with a typed schema and structured data.

    Finally, the Hive Metastore serves as the ground truth for all data and its schema.

    See more
    StackShare Editors
    StackShare Editors
    Apache Thrift
    Apache Thrift
    Kotlin
    Kotlin
    Presto
    Presto
    HHVM (HipHop Virtual Machine)
    HHVM (HipHop Virtual Machine)
    gRPC
    gRPC
    Kubernetes
    Kubernetes
    Apache Spark
    Apache Spark
    Airflow
    Airflow
    Terraform
    Terraform
    Hadoop
    Hadoop
    Swift
    Swift
    Hack
    Hack
    Memcached
    Memcached
    Consul
    Consul
    Chef
    Chef
    Prometheus
    Prometheus

    Since the beginning, Cal Henderson has been the CTO of Slack. Earlier this year, he commented on a Quora question summarizing their current stack.

    Apps
    • Web: a mix of JavaScript/ES6 and React.
    • Desktop: And Electron to ship it as a desktop application.
    • Android: a mix of Java and Kotlin.
    • iOS: written in a mix of Objective C and Swift.
    Backend
    • The core application and the API written in PHP/Hack that runs on HHVM.
    • The data is stored in MySQL using Vitess.
    • Caching is done using Memcached and MCRouter.
    • The search service takes help from SolrCloud, with various Java services.
    • The messaging system uses WebSockets with many services in Java and Go.
    • Load balancing is done using HAproxy with Consul for configuration.
    • Most services talk to each other over gRPC,
    • Some Thrift and JSON-over-HTTP
    • Voice and video calling service was built in Elixir.
    Data warehouse
    • Built using open source tools including Presto, Spark, Airflow, Hadoop and Kafka.
    Etc
    See more
    Eric Colson
    Eric Colson
    Chief Algorithms Officer at Stitch Fix · | 19 upvotes · 275.8K views
    atStitch FixStitch Fix
    Amazon EC2 Container Service
    Amazon EC2 Container Service
    Docker
    Docker
    PyTorch
    PyTorch
    R
    R
    Python
    Python
    Presto
    Presto
    Apache Spark
    Apache Spark
    Amazon S3
    Amazon S3
    PostgreSQL
    PostgreSQL
    Kafka
    Kafka
    #Data
    #DataStack
    #DataScience
    #ML
    #Etl
    #AWS

    The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.

    Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).

    At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.

    For more info:

    #DataScience #DataStack #Data

    See more
    Interest over time
    Reviews of Hibernate and Apache Spark
    No reviews found
    How developers use Hibernate and Apache Spark
    Avatar of Kang Hyeon Ku
    Kang Hyeon Ku uses HibernateHibernate

    Mybatis 로 쿼리를 만들고 조건분 분기식 for 문을 쿼리에 달아 더이상 쿼리를 알아 볼 수 없게 되었을때 이게 의마가 있나 싶었다. 그 때 한번 orm 을 써보면 어떨까 싶어 최근에 배우기 시작한 orm 이다. 정말 편하게 개발을 할 수 있는데 일조하고 있다. 다만 결국에 쿼리를 날려 맵핑을 하는데, 쿼리를 잘 모르거나 그에 대한 지식 없이 쓰다가는 망하겠구나 하는 생각이 많이 들었다.

    Avatar of Analytical Informatics
    Analytical Informatics uses HibernateHibernate

    We use a Clojure-powered wrapper around Hibernate to provide an ORM access to our data store for applications, as well as offering SSO integration and HIPAA logging functionality.

    Avatar of Wei Chen
    Wei Chen uses Apache SparkApache Spark

    Spark is good at parallel data processing management. We wrote a neat program to handle the TBs data we get everyday.

    Avatar of Tongliang Liu
    Tongliang Liu uses HibernateHibernate

    Can't escape from when you're on the Java stack and deal with relational db.

    Avatar of Satoru Ishikawa
    Satoru Ishikawa uses HibernateHibernate

    Strut や Spring など Java web app flame work での Object Relation Mapperとして

    Avatar of icarus-dave
    icarus-dave uses HibernateHibernate

    Persistence layer for backend data; maps entities to the database.

    Avatar of Ralic Lo
    Ralic Lo uses Apache SparkApache Spark

    Used Spark Dataframe API on Spark-R for big data analysis.

    Avatar of BrainFinance
    BrainFinance uses Apache SparkApache Spark

    As a part of big data machine learning stack (SMACK).

    Avatar of Kalibrr
    Kalibrr uses Apache SparkApache Spark

    We use Apache Spark in computing our recommendations.

    Avatar of Dotmetrics
    Dotmetrics uses Apache SparkApache Spark

    Big data analytics and nightly transformation jobs.

    How much does Hibernate cost?
    How much does Apache Spark cost?
    Pricing unavailable
    Pricing unavailable
    News about Apache Spark
    More news