StackShareStackShare
Follow on
StackShare

Discover and share technology stacks from companies around the world.

Product

  • Stacks
  • Tools
  • Companies
  • Feed

Company

  • About
  • Blog
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

© 2025 StackShare. All rights reserved.

API StatusChangelog
Apache Hive
ByApache HiveApache Hive

Apache Hive

#37in Databases
Discussions2
Followers475
OverviewDiscussions2

What is Apache Hive?

Hive facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage.

Apache Hive is a tool in the Databases category of a tech stack.

Key Features

Built on top of Apache HadoopTools to enable easy access to data via SQLSupport for extract/transform/load (ETL), reporting, and data analysisAccess to files stored either directly in Apache HDFS and HBaseQuery execution using Apache Hadoop MapReduce, Tez or Spark frameworks

Apache Hive Pros & Cons

Pros of Apache Hive

No pros listed yet.

Cons of Apache Hive

No cons listed yet.

Apache Hive Alternatives & Comparisons

What are some alternatives to Apache Hive?

Apache Spark

Apache Spark

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Splunk

Splunk

It provides the leading platform for Operational Intelligence. Customers use it to search, monitor, analyze and visualize machine data.

Apache Flink

Apache Flink

Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

Amazon Athena

Amazon Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

AWS Glue

AWS Glue

A fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.

Presto

Presto

Distributed SQL Query Engine for Big Data

Apache Hive Integrations

Mode, SQLFlow, Hadoop, Apache Spark, HBase and 7 more are some of the popular tools that integrate with Apache Hive. Here's a list of all 12 tools that integrate with Apache Hive.

Mode
Mode
SQLFlow
SQLFlow
Hadoop
Hadoop
Apache Spark
Apache Spark
HBase
HBase
Apache NiFi
Apache NiFi
Greenplum Database
Greenplum Database
Apache Parquet
Apache Parquet
DBeaver
DBeaver
Azure HDInsight
Azure HDInsight
DbSchema
DbSchema
KNIME
KNIME

Apache Hive Discussions

Discover why developers choose Apache Hive. Read real-world technical decisions and stack choices from the StackShare community.

Jan Vlnas
Jan Vlnas

Senior Software Engineer

Oct 6, 2022

Needs adviceonOpenRefineOpenRefineJupyterJupyterDatabricksDatabricks

From my point of view, both OpenRefine and Apache Hive serve completely different purposes. OpenRefine is intended for interactive cleaning of messy data locally. You could work with their libraries to use some of OpenRefine features as part of your data pipeline (there are pointers in FAQ), but OpenRefine in general is intended for a single-user local operation.

I can't recommend a particular alternative without better understanding of your use case. But if you are looking for an interactive tool to work with big data at scale, take a look at notebook environments like Jupyter, Databricks, or Deepnote. If you are building a data processing pipeline, consider also Apache Spark.

Edit: Fixed references from Hadoop to Hive, which is actually closer to Spark.

0 views0
Comments
Shehryar Mallick
Shehryar Mallick

Associate Data Engineer

Oct 3, 2022

Needs adviceonApache HiveApache HiveHadoopHadoopOpenRefineOpenRefine

I've been going over the documentation and couldn't find answers to different questions like:

Apache Hive is built on top of Hadoop meaning if I wanted to scale it up I could do either horizontal scaling or vertical scaling. but if I want to scale up openrefine to cater more data then how can this be achieved? the only thing I could find was to allocate more memory like 2 of 4GB but using this approach would mean that we would run out of memory to allot. so thoughts on this?

Secondly, Hadoop has MapReduce meaning a task is reduced to many mapper running in parallel to perform the task which in turn increase the processing speed, is there a similar mechanism in OpenRefine or does it only have a single processing unit (as it is running locally). thoughts?

0 views0
Comments

Try It

Visit Website

Adoption

On StackShare

Companies
79
KPSTRE+73
Developers
376
AZIATT+370