StackShareStackShare
Follow on
StackShare

Discover and share technology stacks from companies around the world.

Follow on

© 2025 StackShare. All rights reserved.

Product

  • Stacks
  • Tools
  • Feed

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  1. Stackups
  2. Application & Data
  3. Databases
  4. Databases
  5. Apache Parquet vs Oracle

Apache Parquet vs Oracle

OverviewDecisionsComparisonAlternatives

Overview

Oracle
Oracle
Stacks2.6K
Followers1.8K
Votes113
Apache Parquet
Apache Parquet
Stacks97
Followers190
Votes0

Apache Parquet vs Oracle: What are the differences?

Introduction

In this Markdown code, I will provide the key differences between Apache Parquet and Oracle. Apache Parquet is a columnar storage file format, designed to work with big data processing frameworks like Apache Hadoop and Apache Spark. Oracle, on the other hand, is a widely used relational database management system.

  1. Data Organization: Apache Parquet organizes data in a columnar format, storing values of each column separately. This allows for efficient compression and encoding techniques to be applied, resulting in better query performance and reduced IO. In contrast, Oracle organizes data in a row-based format, storing all the values of a row together. This makes it suitable for transactional processing and OLTP workloads.

  2. Data Compression: Apache Parquet supports various compression techniques like Snappy, Gzip, and LZO, which can be chosen based on the specific requirements of the data. This helps in reducing storage space and improving query performance. Oracle also supports compression, but the options available are limited compared to Parquet.

  3. Schema Evolution: Apache Parquet allows for schema evolution, meaning that new columns can be added to the data without affecting the existing schema. This provides flexibility in handling evolving data structures. Oracle has a more rigid schema management approach, where any changes to the schema would require altering the table structure and potentially impacting the existing data.

  4. Query Performance: Due to its columnar storage format and efficient compression techniques, Apache Parquet provides faster query performance when dealing with large datasets. Oracle, being a traditional RDBMS, may have slower query performance when handling big data workloads compared to Parquet, especially in analytical processing scenarios.

  5. Data Types: Apache Parquet supports a wide variety of data types, including primitive types, complex types, and nested types. This allows for storing and processing diverse data formats. Oracle also supports a wide range of data types, but the options available may be more aligned with relational database concepts.

  6. Ecosystem Integration: Apache Parquet is well-integrated with big data processing frameworks like Apache Hadoop and Apache Spark. It is a widely adopted format in the Hadoop ecosystem, making it easier to integrate into existing data processing workflows. Oracle, being a standalone database system, may require additional configurations or connectors to integrate with big data frameworks.

In summary, Apache Parquet provides efficient columnar storage, flexible schema evolution, better query performance for big data workloads, and seamless integration with big data processing frameworks. Oracle, on the other hand, offers a more traditional row-based storage, limited compression options, stricter schema management, and may require additional setup for integration with big data ecosystems.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs
CLI (Node.js)
or
Manual

Advice on Oracle, Apache Parquet

Daniel
Daniel

Data Engineer at Dimensigon

Jul 18, 2020

Decided

We have chosen Tibero over Oracle because we want to offer a PL/SQL-as-a-Service that the users can deploy in any Cloud without concerns from our website at some standard cost. With Oracle Database, developers would have to worry about what they implement and the related costs of each feature but the licensing model from Tibero is just 1 price and we have all features included, so we don't have to worry and developers using our SQLaaS neither. PostgreSQL would be open source. We have chosen Tibero over Oracle because we want to offer a PL/SQL that you can deploy in any Cloud without concerns. PostgreSQL would be the open source option but we need to offer an SQLaaS with encryption and more enterprise features in the background and best value option we have found, it was Tibero Database for PL/SQL-based applications.

496k views496k
Comments
Abigail
Abigail

Dec 6, 2019

Decided

In the field of bioinformatics, we regularly work with hierarchical and unstructured document data. Unstructured text data from PDFs, image data from radiographs, phylogenetic trees and cladograms, network graphs, streaming ECG data... none of it fits into a traditional SQL database particularly well. As such, we prefer to use document oriented databases.

MongoDB is probably the oldest component in our stack besides Javascript, having been in it for over 5 years. At the time, we were looking for a technology that could simply cache our data visualization state (stored in JSON) in a database as-is without any destructive normalization. MongoDB was the perfect tool; and has been exceeding expectations ever since.

Trivia fact: some of the earliest electronic medical records (EMRs) used a document oriented database called MUMPS as early as the 1960s, prior to the invention of SQL. MUMPS is still in use today in systems like Epic and VistA, and stores upwards of 40% of all medical records at hospitals. So, we saw MongoDB as something as a 21st century version of the MUMPS database.

540k views540k
Comments
Abigail
Abigail

Dec 10, 2019

Decided

We wanted a JSON datastore that could save the state of our bioinformatics visualizations without destructive normalization. As a leading NoSQL data storage technology, MongoDB has been a perfect fit for our needs. Plus it's open source, and has an enterprise SLA scale-out path, with support of hosted solutions like Atlas. Mongo has been an absolute champ. So much so that SQL and Oracle have begun shipping JSON column types as a new feature for their databases. And when Fast Healthcare Interoperability Resources (FHIR) announced support for JSON, we basically had our FHIR datalake technology.

558k views558k
Comments

Detailed Comparison

Oracle
Oracle
Apache Parquet
Apache Parquet

Oracle Database is an RDBMS. An RDBMS that implements object-oriented features such as user-defined types, inheritance, and polymorphism is called an object-relational database management system (ORDBMS). Oracle Database has extended the relational model to an object-relational model, making it possible to store complex business models in a relational database.

It is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

-
Columnar storage format;Type-specific encoding; Pig integration; Cascading integration; Crunch integration; Apache Arrow integration; Apache Scrooge integration;Adaptive dictionary encoding; Predicate pushdown; Column stats
Statistics
Stacks
2.6K
Stacks
97
Followers
1.8K
Followers
190
Votes
113
Votes
0
Pros & Cons
Pros
  • 44
    Reliable
  • 33
    Enterprise
  • 15
    High Availability
  • 5
    Hard to maintain
  • 5
    Expensive
Cons
  • 14
    Expensive
No community feedback yet
Integrations
No integrations available
Hadoop
Hadoop
Java
Java
Apache Impala
Apache Impala
Apache Thrift
Apache Thrift
Apache Hive
Apache Hive
Pig
Pig

What are some alternatives to Oracle, Apache Parquet?

MongoDB

MongoDB

MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.

MySQL

MySQL

The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software.

PostgreSQL

PostgreSQL

PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions.

Microsoft SQL Server

Microsoft SQL Server

Microsoft® SQL Server is a database management and analysis system for e-commerce, line-of-business, and data warehousing solutions.

SQLite

SQLite

SQLite is an embedded SQL database engine. Unlike most other SQL databases, SQLite does not have a separate server process. SQLite reads and writes directly to ordinary disk files. A complete SQL database with multiple tables, indices, triggers, and views, is contained in a single disk file.

Cassandra

Cassandra

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

Memcached

Memcached

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

MariaDB

MariaDB

Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stable, and sanely licensed open SQL server in the industry. MariaDB is designed as a drop-in replacement of MySQL(R) with more features, new storage engines, fewer bugs, and better performance.

RethinkDB

RethinkDB

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.

ArangoDB

ArangoDB

A distributed free and open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

Related Comparisons

Bootstrap
Materialize

Bootstrap vs Materialize

Laravel
Django

Django vs Laravel vs Node.js

Bootstrap
Foundation

Bootstrap vs Foundation vs Material UI

Node.js
Spring Boot

Node.js vs Spring-Boot

Liquibase
Flyway

Flyway vs Liquibase